Integrations

•10 min read

How to write an integration spec a vendor cannot misinterpret

A practitioner template for integration specs that force decisions before code is written — idempotency, retry semantics, payload contracts, error taxonomy, and the observability hooks that make partner blame games short.

Author: By DevLume
Published: Published 22 May 2026

Key takeaways

Most integration disputes are not technical disagreements but ambiguity in writing — the IETF's idempotency-key draft exists precisely because retry behaviour was the most-disputed surface between API vendors and their integrators (IETF httpapi-idempotency-key-header).

Specify error semantics in RFC 9457 Problem Details format. It is the current IETF standard for machine-readable error bodies and replaces the older RFC 7807 (RFC 9457).

State the retry budget, backoff curve, and jitter strategy explicitly. AWS's published guidance is "binary exponential backoff with jitter" — copy that wording into the spec so the vendor cannot debate it later (AWS Architecture Blog).

Use the OpenAPI 3.1 schema as the payload contract source of truth, not prose. Prose is where misinterpretation lives (OpenAPI Specification 3.1).

Make observability part of the contract: correlation header name, log retention, and the dashboard URL the vendor will share with you during incidents. If it is not in the spec, you will not have it at 2 a.m.

TL;DR

A useful integration specification template — one that a vendor cannot reasonably misinterpret — shares a small number of properties. They define the request and response payloads as OpenAPI schemas, not as paragraphs. They name an idempotency-key header and state — in one sentence — what the server must do when it sees a duplicate. They publish the retry budget, the backoff curve, and the jitter strategy as numbers the vendor can program against. Error bodies follow RFC 9457 Problem Details. Authentication, rate limits, and webhook signing each get their own dated subsection with a worked example. And critically, the spec assigns a named owner on both sides and a single source-of-truth URL — usually a Git repository with a versioned openapi.yaml — so that "what does the spec say" is never an opinion. The rest of this piece is a section-by-section template you can adapt, with the questions to force decisions out of a vendor before any code is written.

Why ambiguity is the bug

The first time a B2B integration project goes badly, the post-mortem language is almost always wrong. Engineers say "the vendor's API was buggy." Vendors say "the integrator misused the API." Both are usually describing the same artefact: a spec that did not commit to a behaviour, and two teams that filled the gap with different defaults.

Idempotency is the canonical example. The IETF has a draft standard for it (Idempotency-Key), and the draft's own problem statement is unambiguous: "Distributed and decentralized nature of various systems make it harder to detect duplicates and reconcile failed transactions" and idempotency keys "allow API providers to ensure exactly-once semantics for client requests" (IETF httpapi-idempotency-key-header). If a draft RFC was needed to settle this, it tells you how much money has been lost to spec ambiguity around retries.

A spec is not a wishlist. It is the document you reach for at 2 a.m. when production is double-charging customers and you need to point at one paragraph that says "the server must not produce more than one side-effect per idempotency key, even across retries, for at least 24 hours." If that paragraph is not there, you do not have a spec — you have a hope.

The template, section by section

What follows is the structure I use on engagements. It is opinionated, and it deliberately forces vendor product teams to answer questions they would prefer to keep vague. The goal is not bureaucratic completeness; it is decision-forcing. Every section ends with the question I send back if the vendor's response is missing.

1. Scope, ownership, and source of truth

State which integration this is, which business flows it supports, and what is explicitly out of scope. Name a primary engineering owner on each side and a single Git-hosted URL where the canonical openapi.yaml lives. Version the spec by date and Git tag — never call anything "v1 final."

Ask: Where will the machine-readable schema live, who can write to it, and how do schema changes get communicated? If the vendor's answer is "we'll send you a PDF when there are updates," you have a process problem and you should price it in.

2. Transport, authentication, and tenancy

Specify protocol (REST/HTTPS, gRPC, message broker), authentication scheme (OAuth 2.1 client credentials, mTLS, signed-request HMAC), and how tenancy is conveyed on every request. OAuth 2.1 is now the IETF's consolidated profile and the right default for new server-to-server integrations (draft-ietf-oauth-v2-1). For HMAC-signed webhooks, name the algorithm, the canonicalisation rules, and the header.

Ask: What is the rotation cadence for credentials, and what is the documented procedure when a secret is suspected compromised? If there is no documented procedure, write one and append it.

3. Payload contract: OpenAPI, not prose

Pin the request and response payloads to an OpenAPI 3.1 schema (OpenAPI Specification 3.1). 3.1 is fully JSON Schema 2020-12 compatible, which matters because it removes the historical incompatibilities that used to bite teams using shared validators across producer and consumer.

The body of this section is a single sentence: "All request and response payloads are defined in openapi.yaml at the canonical URL above. This document does not duplicate field-level descriptions. Where prose and schema disagree, the schema is authoritative." That sentence alone resolves a category of disputes.

Ask: Which fields are nullable, which are optional, and what is the server's behaviour on unknown fields — reject, ignore, or echo? Make the vendor commit to one.

4. Idempotency

This is the section that most often does not exist. The shape used by Stripe is the de facto industry pattern and is worth copying directly: the client sends an Idempotency-Key header with a UUID per logical operation; the server stores the request fingerprint and response for at least 24 hours; on a duplicate key with a matching fingerprint, the server replays the stored response; on a duplicate key with a mismatched fingerprint, the server returns a specific error (Stripe API Reference — Idempotent Requests).

Specify exactly:

The header name (default to Idempotency-Key per the IETF draft).
The minimum retention window (24 hours is the Stripe-shaped default; 7 days is safer for batch integrations).
The behaviour on mismatched fingerprint (return 409 with a problem+json body).
Whether the key is global, scoped to the API token, or scoped to the tenant.

Ask: Are GET and DELETE requests treated as inherently idempotent per RFC 9110, or do they also accept the idempotency header? RFC 9110 makes GET, HEAD, PUT, and DELETE idempotent by definition (RFC 9110 §9.2.2), but vendors disagree on whether the header is allowed on those methods. Pick a side.

5. Retry semantics, backoff, and jitter

State the client's retry budget and the server's expectation in numbers. The wording I use is borrowed almost verbatim from AWS's published architecture guidance: "implement an exponential backoff algorithm with jitter" with a maximum delay, a maximum attempt count, and a recommendation against retrying on 4xx responses other than 408, 425, and 429 (AWS Architecture Blog).

A concrete clause for the spec:

Clients SHOULD retry transient failures using full-jitter exponential backoff with a base of 200 ms and a cap of 30 s, for a maximum of 6 attempts within a 10-minute budget. Retries MUST carry the same Idempotency-Key as the original request. Clients MUST NOT retry on 4xx responses other than 408 Request Timeout, 425 Too Early, 429 Too Many Requests, and 5xx.

The Retry-After header is the server's escape valve. RFC 9110 defines it precisely, including the two valid value forms (delta-seconds and HTTP-date) (RFC 9110 §10.2.3). State which form the server will use and require the client to honour it.

Ask: What is the documented rate-limit response — 429, what headers does it carry, and what does the server expect the client to do?

6. Error taxonomy

Errors are the most-frequently-misspecified surface. Pin them to RFC 9457 Problem Details for HTTP APIs, which is now the IETF standard (RFC 9457). RFC 9457 obsoletes the older RFC 7807 and tightens a few things that mattered in practice, including extension-member handling.

A problem+json body has five named members: type (a URI that identifies the problem class), title (a short human-readable summary), status (the HTTP status code), detail (a human-readable explanation specific to this occurrence), and instance (a URI identifying the specific occurrence). Pin the type URIs to a documented enumeration — that enumeration is the actual error taxonomy.

The minimum useful taxonomy has roughly a dozen entries: authentication failure, authorisation failure, validation failure, idempotency-key conflict, rate limit exceeded, resource not found, conflict, precondition failed, payload too large, unsupported media type, upstream timeout, and internal error. Each entry gets a stable type URI, an example body, and the documented client behaviour. The client behaviour matters: it is the part the integrator has to write, and absent guidance they will write something arbitrary.

Ask: For validation errors, will the body include field-level details, and in what shape? The RFC permits extension members; pin them.

7. Webhooks, signing, and replay protection

If the integration includes outbound webhooks, the spec needs its own section for them. Signing algorithm (HMAC-SHA-256 with a per-tenant secret is the workable default), canonicalisation (which headers, body bytes, timestamp), signature header name, and timestamp tolerance. Replay protection is a one-line clause that requires the server to reject signed requests whose timestamp is more than five minutes from the receiver's clock.

Specify the retry policy for failed webhook deliveries: budget, backoff, dead-letter behaviour, and how the integrator can request a replay. If there is no replay mechanism, the section ends with that admission — and you make a delivery decision on the strength of it.

Ask: What is the documented behaviour when a webhook fingerprint matches a previously-delivered one — silent drop, error, or delivered with a flag? This is the inverse of inbound idempotency and it gets forgotten constantly.

8. Observability and incident response

Most integration specs stop before this section, which is exactly the section you reach for when something breaks. It contains, at minimum:

The correlation-ID header name and propagation rules. W3C Trace Context (traceparent and tracestate) is the right default — it is a finalised W3C recommendation and the standard most SDKs and gateways already emit (W3C Trace Context).
The vendor's log retention window for integration requests and the procedure for requesting historical traces.
A named dashboard URL the vendor will grant read access to during incidents.
The escalation path on both sides — who picks up the phone, who has the authority to declare a sev-1, what the documented response-time SLO is.

Ask: During an incident, what is the maximum time before a human on the vendor side acknowledges, and what is the documented escalation path beyond that?

9. Versioning and deprecation policy

State the version-in-URL or version-in-header decision and stick to it. State the deprecation window — six months for a 2xx-changing change, twelve months for a removed endpoint is a reasonable industry shape. Require deprecation notices to be communicated by both Sunset header (per RFC 8594) and email to the named integration owner (RFC 8594).

Ask: What does the vendor's history of breaking changes actually look like in the last 24 months? The answer is more informative than the policy.

What to leave out

A few things are conspicuously absent from this template because they consistently cause more friction than they prevent.

Code samples in multiple languages. Maintain the OpenAPI schema and let generators produce the client code. Hand-written code samples drift and get cited as authoritative when they are not.

Diagrams of the happy path. Diagrams age badly and rarely capture the surfaces that matter (retries, failures, partial states). Spend the time on the failure-mode tables instead.

Aspirational service-level objectives. Put SLOs in the contract, not in the spec. Specs describe behaviour; contracts describe consequences.

Using the spec as a forcing function

The best moment to use this template is before the first line of integration code is written, because the questions in each section make vendor product teams commit on the record. If the answer to "what is the idempotency-key retention window" is "we'll get back to you," you have just found a risk. Write it down, scope around it, and re-ask in two weeks.

The integrations that ship on time are not the ones with the most code review; they are the ones whose specs answered the hard questions early. The integrations that go badly are almost always the ones where the spec was written after the first prototype, retrofitted to match whatever the code happened to do that week. Inverting that order is the cheapest reliability investment available to you.

DevLume runs integration discovery as a first deliverable on most engagements — happy to share the engagement-grade version of this integration specification template if you would like a starting point grounded in real production projects.