Skip to content
← Back to CargoParse

API Documentation

Integrate CargoParse into your TMS, ERP, or custom workflow.

API access is available on Haul, Fleet, and Terminal plans. Generate keys in Account → Developer.

Authentication

All requests require an Authorization header with your API key:

Authorization: Bearer cp_live_your_key_here

Keys can be created with an expiration (30 days, 90 days, 1 year, or no expiry). Revoke and regenerate them from the Developer tab. Never include keys in client-side code or git repositories.

API keys and teams

Personal API keys are tied to an individual user. When that user joins a team, their personal keys are paused — requests return 403with a migration hint, to prevent a personal-scope credential from silently authenticating against the team's data. Keys stay on the user record so they automatically resume if they leave the team, and they can still be revoked from the Developer tab while paused.

Team-scoped API keys are available for Developer+ org members. Create them alongside personal keys from the Developer tab — team keys read and write team-partition data. For non-programmatic integrations, teams can also use team webhooks or the shared team email-to-upload address.

Webhook Events

For real-time notifications instead of polling, configure a webhook endpoint in Account → Automation. CargoParse sends HTTP POST requests to your endpoint when document events occur. Each paid plan supports up to 10 endpoints with independent signing secrets and failure tracking.

Team webhooks fire for every team upload regardless of who submitted it. Personal webhook endpoints are paused while a user is on a team — a Developer or above can click Copy to teamon any paused endpoint to add a team copy with a freshly rotated signing secret. The personal record stays in place so it's ready to resume if the user leaves.

Event Types

document.completedExtraction succeeded — payload includes full results.
document.needs_reviewExtraction succeeded but quality is below the configured threshold.
document.failedExtraction failed — payload includes error details.
document.rejectedDocument is not a recognized freight document type.
document.approvedA needs-review document was manually approved.

Example Payload

A document.completed event delivers the full extraction result:

{
  "event": "document.completed",
  "deliveryId": "evt_a1b2c3d4-5678-...",
  "timestamp": "2026-03-17T14:30:00.000Z",
  "documentId": "doc-456",
  "filename": "bol-march-17.pdf",
  "documentType": "BILL_OF_LADING",
  "qualityScore": 92,
  "qualityTier": "clean",
  "stats": { "total": 24, "ok": 22, "review": 1, "missing": 1 },
  "data": {
    "bol_number":      { "value": "BOL-12345",   "confidence": 96, "flag": "ok" },
    "shipper_name":    { "value": "ACME Freight", "confidence": 92, "flag": "ok" },
    "consignee_name":  { "value": "Globex Co.",   "confidence": 88, "flag": "ok" },
    "pickup_date":     { "value": "2026-03-17",   "confidence": 71, "flag": "review" }
    // … one entry per field on this document type
  },
  "lineItems": [
    {
      "groupId": "commodity_items",
      "items": [
        { "commodity_description": { "value": "Pallets of widgets", "confidence": 90, "flag": "ok" }, "commodity_weight": { "value": "1200 lbs", "confidence": 85, "flag": "ok" } }
        // … one object per row
      ]
    }
    // … one element per group on the document type (most types have one)
  ]
}

The data map is the same flat shape the document detail page renders. Each value is a FlaggedField: { value, confidence (0-100), flag ("ok"|"review"|"missing") }. lineItems is an array of { groupId, items } objects — groupId is the group identifier (e.g. commodity_items on a BOL, delivery_items on a POD), and items is the row array, where each row is a flat FlaggedField map. Both are present on document.completed, document.needs_review, and document.approved; absent on document.failed / document.rejected / automation_failure (no extraction happened).

Same shape across surfaces, different envelope. Webhooks put data and lineItems at the top level of the body. The REST APIs nest them: GET /api/v1/jobs/:jobId returns { result: { data, lineItems, stats, meta } }, and GET /api/v1/documents/:id returns { document: { latestResult: { data, lineItems, ... } } }. The fields inside are identical — only the wrapper changes — so a small adapter at the entry point of your handler keeps the rest of your pipeline shape-agnostic.

Two optional boolean flags may appear on document.completedwhen it isn't a fresh-extraction event:
templateRecompute: true— the document was re-evaluated against an updated export template and just flipped from needs_review to clean. The original extraction didn't change. Useful if you want to filter these out of an intake pipeline.
manualRefire: true— the user manually re-pushed this doc's current state via POST /api/v1/documents/:id/refire, typically after editing fields. Use this to ignore replays in idempotent intake systems, or to surface them differently in your UI.

Signature Verification

Every request includes an X-CargoParse-Signature header. Verify it to ensure the payload is authentic and untampered:

# Header format:
# X-CargoParse-Signature: t=<unix_timestamp>,v1=<hex_hmac>

# Verification (Node.js):
const crypto = require("crypto");

function verifyWebhook(body, signatureHeader, secret, toleranceSeconds = 300) {
  const parts = Object.fromEntries(
    signatureHeader.split(",").map(p => p.split("=", 2))
  );

  // Reject deliveries with a timestamp outside your tolerance window.
  // Defends against replay attacks if a delivery is captured + replayed later.
  const now = Math.floor(Date.now() / 1000);
  if (Math.abs(now - Number(parts.t)) > toleranceSeconds) return false;

  // Signed payload format: "t=<timestamp>.<raw_body>" — note the
  // literal "t=" prefix is included in the HMAC input.
  const expected = crypto
    .createHmac("sha256", secret)
    .update("t=" + parts.t + "." + body)
    .digest("hex");
  return crypto.timingSafeEqual(
    Buffer.from(expected, "hex"),
    Buffer.from(parts.v1, "hex")
  );
}

The t= timestamp is included in the HMAC input, so any tampering with it invalidates the signature. We recommend a tolerance window of 5 minutes (300 seconds) — long enough to absorb our 35-second retry window and clock skew, short enough to make replay impractical.

Delivery Behavior

CargoParse expects your endpoint to respond within 10 seconds. If delivery fails, it retries up to 3 times with exponential backoff (5s, then 30s delay). A 4xx response is treated as a permanent failure and is not retried. After all attempts fail, an automation_failure email notification is sent to the account owner.

Circuit breaker: After 10 consecutive delivery failures, the affected endpoint is automatically paused while others keep running. You can re-enable it from Account → Automation.

Multiple endpoints: You can configure up to 10 webhook endpoints per account or organization. Each endpoint has its own URL, secret, event filters, and independent failure tracking. Manage them at Account → Automation (solo) or Account → Team (org). Endpoints are configured via the UI, not the API.

Replay: Failed deliveries can be replayed from the delivery log in Account → Automation. Delivery logs (with per-attempt timestamps) are available for 7 days.

Upload Documents

Document upload is a 3-step flow: get a presigned URL, upload to S3, then enqueue for processing. File bytes go directly to S3 — never through the CargoParse API server.

Limits: up to 30 files per request, 15 MB per file. Accepted types: PDF, JPEG, PNG, TIFF. Multi-page captures (e.g. phone photos of a BOL) can be bundled into a single logical document via the optional groups parameter.

# Step 1 — get presigned upload URL(s)
POST /api/v1/jobs
Authorization: Bearer cp_live_your_key
Content-Type: application/json

{
  "files": [
    { "name": "bol-123.pdf", "size": 245000, "mimeType": "application/pdf" }
  ]
}

# Optional — bundle several images into one logical document (multi-page capture).
# Max 5 groups per request, 2–10 files per group. fileIndices reference the files array above.
# "groups": [{ "groupName": "POD March 17", "fileIndices": [0, 1, 2] }]

→ {
    "ok": true,
    "jobs": [{
      "jobId": "abc-123",
      "documentId": "doc-456",
      "filename": "bol-123.pdf",
      "uploadUrl": "https://s3.amazonaws.com/...",
      "uploadFields": { "key": "...", "policy": "...", "x-amz-signature": "...", ... }
    }]
  }

# Step 2 — upload file bytes to S3 via presigned POST (5-min expiry)
POST <uploadUrl>
Content-Type: multipart/form-data
# Include all uploadFields as form fields, then append file as the last field

# Step 3 — enqueue for processing
POST /api/v1/jobs/enqueue
Authorization: Bearer cp_live_your_key
Content-Type: application/json

{ "jobIds": ["abc-123"] }

→ { "ok": true, "jobs": [{ "jobId": "abc-123", "documentId": "doc-456", "filename": "bol-123.pdf" }] }

Poll for Results

Poll GET /api/v1/jobs/:jobId until status is terminal. No auth required — jobIds are UUIDs known only to you.

# Poll until status is SUCCEEDED, FAILED, or REJECTED (no auth required)
GET /api/v1/jobs/abc-123

→ { "status": "SUCCEEDED", "documentId": "doc-456" }  # or PROCESSING, QUEUED…

Typical processing time is 5–30 seconds. Recommended poll interval: 2s initial, backing off to 5s. Status values: QUEUED PROCESSING SUCCEEDED FAILED REJECTED.

Retrieve Extracted Data

GET /api/v1/documents/doc-456/export?format=json
Authorization: Bearer cp_live_your_key

→ {
    "meta": { "documentType": "BILL_OF_LADING", "textSource": "embedded" },
    "data": {
      "bol_number": { "value": "BOL-12345", "confidence": 95, "flag": "ok" },
      "shipper_name": { "value": "ACME Corp", "confidence": 88, "flag": "ok" },
      "weight_lbs": { "value": "5000 LBS", "confidence": 55, "flag": "review" }
    },
    "lineItems": [
      {
        "groupId": "commodity_items",
        "label": "Commodity Items",
        "items": [
          {
            "commodity_description": { "value": "Electronics — Desktop Computers", "confidence": 88, "flag": "ok" },
            "commodity_weight": { "value": "2,500 LBS", "confidence": 82, "flag": "ok" },
            "commodity_pieces": { "value": "50", "confidence": 90, "flag": "ok" }
          }
        ]
      }
    ],
    "stats": { "total": 24, "ok": 18, "review": 4, "missing": 2 }
  }

confidence is an integer in the range 0–100 (or null when the field is absent). flag values: ok (high confidence), review (below threshold), missing (not found). Supported formats: ?format=json ?format=csv ?format=xlsx ?format=pdf. Add &templateId=tmpl_... to apply a column mapping template.

Endpoint Reference

Documents

GET/api/v1/documents

List documents. Supports ?limit=, ?cursor=, ?search=, ?status=, ?documentType=.

GET/api/v1/documents/:id

Get a single document. Response includes the doc metadata + latestResult (data, lineItems, stats, meta), qualityScore (0-100 overall confidence), qualityTier (clean | needs_review | approved), and — when a template applies — templateStats and resolvedTemplate.

DELETE/api/v1/documents/:id

Permanently delete a document and its S3 file.

GET/api/v1/documents/:id/export

Download extracted data. ?format=json|csv|xlsx|pdf. Optional &templateId=.

POST/api/v1/documents/:id/reprocess

Re-run extraction on an existing document. Optionally pass { "documentType": "RATE_CONFIRMATION" } to override the auto-classified type. Free, does not use a credit.

PATCH/api/v1/documents/:id/fields

Update extracted field values. Body: { fields: { field_key: <value> }, lineItems?, expectedUpdatedAt? }. Each <value> can be (a) a bare string ("ACME Inc."), (b) null to clear the field, or (c) a full FlaggedField object { value, confidence, flag, edited }. expectedUpdatedAt is an optimistic-concurrency token (returns 409 if the server-side updatedAt has moved on since you read it).

POST/api/v1/documents/:id/approve

Mark a needs-review document as approved. Clears review flags on populated fields, sets qualityTier=approved, and fires the document.approved webhook/email.

POST/api/v1/documents/:id/refire

Manually re-fire document.completed to your webhooks and emails (e.g. after editing fields). Payload carries manualRefire: true so subscribers can distinguish a manual replay from a fresh extraction. Returns { ok, fired: { email, webhook } }.

POST/api/v1/documents/search

Advanced search. Body: { documentType?, fields?: { field_key: "substring" }, dateRange?: { from, to }, q?, cursor?, limit? }. Field filters use AND; q uses OR across high-value fields.

GET/api/v1/documents/:id/viewer

Get a presigned S3 URL (15-min expiry) for the original file. Returns { presignedUrl, pageUrls, fileType, isImage, sourceFileDeleted }. Useful for embedding the source document in your own UI.

Jobs

POST/api/v1/jobs

Initiate upload — returns presigned S3 POST URLs. Body: { files: [{ name, size, mimeType }] }.

POST/api/v1/jobs/enqueue

Queue jobs for processing after S3 upload. Body: { jobIds: [...] }.

GET/api/v1/jobs/:jobId

Poll job status. No auth required.

Export Templates

GET/api/v1/export-templates

List templates. Optional ?documentType=BILL_OF_LADING.

POST/api/v1/export-templates

Create a template. Body: { name, documentType, columns: [{ sourceField, outputName }], lineItemGroups?: [{ groupId, sheetName, columns: [...] }] }.

GET/api/v1/export-templates/:id

Get a single template.

PUT/api/v1/export-templates/:id

Update a template.

DELETE/api/v1/export-templates/:id

Delete a template.

Batch Export

POST/api/v1/documents/export-batch

Export multiple documents. Body: { documentIds: [...], templateId, format?: "xlsx"|"csv"|"json" }. Max 100 docs.

Errors

400Bad request or validation failure
401Missing, invalid, or revoked API key
402Plan limit reached — upgrade or wait for reset
404Resource not found
409Conflict (e.g. template limit reached)
413File exceeds 15 MB size limit
429Rate limit exceeded — check Retry-After header
500Server error

Error Handling

All error responses return JSON with an errorfield. Here's how to handle common cases:

# Rate limit exceeded — retry after the indicated wait
HTTP 429
{ "ok": false, "error": "Rate limit exceeded" }
# → Check Retry-After header and retry after that many seconds

# Plan limit reached — user needs to upgrade or wait for reset
HTTP 402
{ "ok": false, "error": "Monthly document limit reached" }
# → Show a message to the user; limits reset on their billing anniversary

# File validation failure
HTTP 400
{ "ok": false, "error": "Unsupported file type" }
# → Accepted types: PDF, JPEG, PNG, TIFF (detected from file content, not extension)

# Extraction failed — the document could not be processed
GET /api/v1/jobs/:id → { "status": "FAILED", "error": "..." }
# → Call POST /api/v1/documents/:id/reprocess to retry (free, no credit charged)

Rate Limits & Retry Strategy

Haul30 requests/min · 10 keys
Fleet60 requests/min · 25 keys
Terminal120 requests/min · 50 keys

When you exceed the rate limit, the API returns HTTP 429 with headers indicating when you can retry:

HTTP 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1710680460
Retry-After: 42

{
  "ok": false,
  "error": "Rate limit exceeded"
}

Recommended Retry Strategy

Use exponential backoff with the Retry-After header:

  • If Retry-After is present, wait that many seconds before retrying.
  • Otherwise, use exponential backoff: 1s, 2s, 4s, 8s, up to 60s max.
  • Add random jitter (0-500ms) to prevent thundering herd on shared rate windows.
  • After 5 consecutive failures, stop retrying and alert your monitoring system.
Questions or missing a feature? Reach out.