# Docule — Complete Technical Reference > PDF parsing API that converts complex documents into structured Markdown, JSON, and plain text. Specializes in financial tables, annual reports, invoices, and multi-column layouts. Docule is a document intelligence API at [docule.dev](https://docule.dev). It provides production-quality PDF to structured data conversion via a REST API. ## Overview Docule solves the problem of extracting structured data from PDF documents — particularly complex financial documents where traditional OCR and text extraction tools fail. The multi-stage pipeline combines layout detection, table extraction, and quality validation to produce accurate, structured output. ### Why Docule? 1. **Accuracy on complex tables**: Financial documents contain side-by-side layouts, merged cells, nested hierarchies, and multi-column reports. Docule handles these correctly where other parsers produce garbled output. 2. **Adaptive cost**: Simple text pages use fast extraction. Only complex pages escalate to higher-fidelity processing, so you pay for what each page needs. 3. **RAG-optimized output**: Each page includes keyword metadata and context headers so downstream applications always know what they're looking at. 4. **Built-in quality assurance**: Column consistency checks, header detection, sum verification. Failed pages are automatically re-processed at higher fidelity. ## API Reference Base URL: `https://docule.dev/api/v1` Authentication: API key via `X-API-Key: YOUR_API_KEY` header. Generate keys from the [API keys dashboard](https://docule.dev/dashboard/api-keys) after signing up. ### Endpoints #### Submit a PDF for parsing ``` POST /api/v1/parse Content-Type: multipart/form-data X-API-Key: YOUR_API_KEY Parameters: - file (required, form field): PDF file to parse (max 50 MB) - formats (query, optional): Comma-separated output formats: json, md, txt (default: "json,md") - no_vision (query, optional): Disable vision API calls (default: false) - force_vision (query, optional): Force vision for all pages (default: false) - agentic (query, optional): Enable agentic quality scan (default: false) - document_type (query, optional): auto or report (default: "auto") ``` Response: `202 Accepted` with `{"job_id": "...", "status": "queued", "created_at": "..."}`. #### Check job status ``` GET /api/v1/status/{job_id} X-API-Key: YOUR_API_KEY Response: Job state, progress, page counts, cost, quality score. ``` #### Get the parsed result ``` GET /api/v1/result/{job_id}?format=json X-API-Key: YOUR_API_KEY format query parameter: json (default), md, or txt Response: Structured result with pages, tables, metadata. ``` #### Download raw output ``` GET /api/v1/result/{job_id}/raw?format=md X-API-Key: YOUR_API_KEY format query parameter: md (default), json, or txt Response: The file contents as text/plain or application/json. ``` #### List recent jobs ``` GET /api/v1/jobs?limit=50&status=completed X-API-Key: YOUR_API_KEY limit: 1-200 (default 50) status: optional filter by job status Response: Array of recent jobs scoped to the authenticated account. ``` #### Health check ``` GET /api/v1/health (No auth required.) Response: {"active_jobs": N, "queued_jobs": N} ``` ### Example Usage (Python) ```python import requests, time API = "https://docule.dev/api/v1" headers = {"X-API-Key": "YOUR_API_KEY"} # 1. Submit a PDF for parsing job = requests.post( f"{API}/parse", headers=headers, files={"file": open("report.pdf", "rb")}, params={"formats": "json,md"}, ).json() job_id = job["job_id"] # 2. Poll until ready while True: status = requests.get(f"{API}/status/{job_id}", headers=headers).json() if status["status"] in ("completed", "failed"): break time.sleep(2) # 3. Fetch the structured result result = requests.get( f"{API}/result/{job_id}", headers=headers, params={"format": "json"}, ).json() for page in result["result"]["pages"]: print(page["markdown"]) ``` ### Example Usage (cURL) ```bash # Submit curl -X POST https://docule.dev/api/v1/parse \ -H "X-API-Key: YOUR_API_KEY" \ -F "file=@report.pdf" # Check status curl https://docule.dev/api/v1/status/JOB_ID \ -H "X-API-Key: YOUR_API_KEY" # Fetch result curl https://docule.dev/api/v1/result/JOB_ID \ -H "X-API-Key: YOUR_API_KEY" # Fetch markdown with page markers curl "https://docule.dev/api/v1/result/JOB_ID/raw?format=md&include_page_markers=true" \ -H "X-API-Key: YOUR_API_KEY" ``` ### Error codes | Code | Meaning | |---|---| | 400 | Bad request (e.g. non-PDF file, file too large) | | 401 | Missing or invalid `X-API-Key` header | | 403 | No active subscription, or job does not belong to your account | | 404 | Job not found | | 409 | Job not yet completed (or failed) | | 429 | Rate limit exceeded, concurrent job limit, or monthly quota exceeded | | 503 | Service not ready | ### Rate limits Per-plan request rates (applied to `POST /parse` and read endpoints): | Plan | Requests/second | Monthly pages | |---|---|---| | Free | 1 | 100 | | Starter | 3 | 2,000 | | Pro | 10 | 6,000 | | Business | 20 | 15,000 | Max 1 concurrent parsing job per user. Max file size: 50 MB per PDF. Max files per job submission: 5,000. ## Document Types Docule handles all PDF document types, with particular strength in: | Document Type | Key Capabilities | |---|---| | Annual reports | Financial tables, segment reports, notes with multi-column layouts | | Invoices | Line items, totals, VAT calculation, vendor/buyer metadata | | Bank statements | Transaction tables, running balances, multi-period statements | | Contracts | Clause extraction, structured sections, appendix handling | | Sustainability reports | ESRS/CSRD tables, KPI extraction, compliance data | | SEC filings | 10-K/10-Q tables, XBRL-style structured data | | Research papers | Academic tables, figure captions, citation sections | | Tax filings | Form field extraction, schedule tables | ## Output Format ### JSON Response Structure (strict default) ```json { "job_id": "abc123", "status": "completed", "result": { "pages": [ { "text": "Income Statement\nItem 2025 2024...", "markdown": "### Income Statement\n| Item | 2025 | 2024 |\n...", "items": [ { "type": "heading", "markdown": "### Income Statement", "text": "Income Statement", "level": 3 }, { "type": "table", "markdown": "| Item | 2025 | 2024 |\n|---|---|---|\n...", "text": "Item 2025 2024...", "table": { "rows": [["Revenue", "1,200", "1,050"]] } } ] } ] }, "metadata": { "filename": "report.pdf", "pages": 12, "cost_usd": 0.08, "api_calls": 3 } } ``` Optional metadata can be enabled per request: - `include_source_path=true` - `include_document_metadata=true` - `include_page_metadata=true` - `include_item_metadata=true` - `include_processing_metadata=true` ### Markdown Output Clean GFM (GitHub Flavored Markdown) with proper table formatting, headers preserved from the source document, and hierarchical structure maintained. ## Pricing Credits-based pricing with a free tier and paid plans for higher volume. One page equals one credit. See [docule.dev](https://docule.dev) for current plans and pricing. ## Links - Website: https://docule.dev - Documentation: https://docule.dev/docs - API Base: https://docule.dev/api/v1 - Privacy Policy: https://docule.dev/privacy - Terms of Service: https://docule.dev/terms - Contact: support@docule.dev