The parsing API for developers who don't have time to clean up messy output. One endpoint, every document, anchored JSON.
Every extracted field carries a page number and bounding box back to its origin. No black box, no hallucinations, no guessing where the number came from.
| (MSEK) | Note | 2025 | 2024 |
|---|---|---|---|
| Net sales | 2 | 25,758 | 23,104 |
| Cost of goods sold | 6 | (18,392) | (16,847) |
| Gross profit | 7,366 | 6,257 | |
| Operating expenses | 9 | (3,914) | (3,512) |
| Operating profit | 3,452 | 2,745 |
Nordic comma decimals. Continental dates. Mixed scripts. We normalise everything to canonical form, so your pipeline doesn't have to.
Bring your own JSON Schema. Docule fills it, validates it, and never invents data to fit. Missing values come back clean.
Built on Nordic financial filings — the hardest parsing task there is. If we handle those, your invoices are easy.
A 200-page PDF holding 50 invoices comes back as 50 sub-documents — each with its own page range and title. Same JSON shape, no extra calls.
One endpoint. Bring your own SDK or just curl it. Zero hidden behavior.
# Submit a document for parsing curl -X POST https://docule.dev/api/v1/parse \ -H "X-API-Key: $DOCULE_API_KEY" \ -F "file=@report.pdf" \ -F "schema_file=@income_statement.schema.json" \ -G --data-urlencode "formats=json,md" # → 200 OK · structured JSON with bbox anchors
Annual reports, 10-Ks, interim statements with multi-language footnotes and cross-page tables.
VAT-aware extraction across EU jurisdictions. Multi-line, multi-currency, with built-in schema validation.
Clause-level extraction with paragraph anchors. Bring your taxonomy, get structured fields back.
OCR with handwriting, checkboxes, and signatures. Auto-routed to vision when text extraction fails.
Equations preserved as LaTeX, citations linked, figures captioned with bbox.
Documentation, manuals, spec sheets. Structure preserved for RAG and LLM agent pipelines.
Start free with 100+ pages every month. No card required. Pay only for what you use — easier pages cost fewer credits.
Docule routes each page to the cheapest path that still produces a clean, anchored output. You only pay extra credits when the page actually needs heavier processing.
Pure text-extractable PDFs. PyMuPDF only — no LLM call, no vision API.
Tables + LLM validation + locale normalize + schema extraction. Most documents land here.
Scanned PDFs, complex layouts, OCR. Vision API invoked automatically when needed.
A typical mix (60% text · 30% smart · 10% vision) averages ~17 credits/page.
100+ free pages every month, on the house. No card. No expiry.