# Docule > PDF parsing API that converts complex documents into structured Markdown, JSON, and plain text. Specializes in financial tables, annual reports, invoices, and multi-column layouts. Docule is a document intelligence API at [docule.dev](https://docule.dev). It parses PDFs into clean, structured output using a multi-stage processing pipeline. ## What Docule Does - Converts PDF documents to structured Markdown, JSON, or plain text - Extracts complex financial tables with correct column alignment and hierarchy - Handles side-by-side balance sheets, 12-column segment reports, nested sub-totals - Parses invoices with line items, totals, VAT, and vendor metadata - Processes annual reports, bank statements, contracts, sustainability reports, SEC filings - Provides RAG-ready page-level chunks with keyword metadata - Validates extracted data: column consistency, header detection, sum verification ## Key Features - **Adaptive pipeline**: Simple pages use fast text extraction; complex layouts automatically escalate to higher-fidelity processing - **Financial table specialization**: Built on the hardest parsing task — income statements, balance sheets, cash flows - **Multiple output formats**: Markdown for retrieval/reading, JSON with strict content-first defaults plus optional metadata toggles - **Batch processing**: Submit thousands of documents asynchronously at reduced cost - **Built-in validation**: Pages that fail quality checks are re-processed at higher fidelity automatically ## API REST API with JSON responses. Authentication via `X-API-Key` header. Base URL: `https://docule.dev/api/v1`. - `POST /api/v1/parse` — Upload a PDF, receive a `job_id` for async processing - `GET /api/v1/status/{job_id}` — Check parse job status and progress - `GET /api/v1/result/{job_id}` — Retrieve the parsed result (json, md, or txt) - `GET /api/v1/result/{job_id}/raw` — Download the raw output file - `GET /api/v1/jobs` — List recent jobs for the authenticated account - `GET /api/v1/health` — Service health (no auth required) JSON responses default to strict content export (`pages[].text`, `pages[].markdown`, `pages[].items`) with optional query toggles: - `include_source_path` - `include_document_metadata` - `include_page_metadata` - `include_item_metadata` - `include_processing_metadata` Markdown/text responses support optional page boundaries via `include_page_markers=true`. ## Supported Document Types Annual reports, invoices, bank statements, contracts, sustainability reports, research papers, tax filings, insurance claims, loan applications, medical records, SEC filings, board materials. ## Pricing Credits-based pricing with a free tier. See [docule.dev](https://docule.dev) for current plans. ## Links - [Documentation](https://docule.dev/docs) - [API Reference](https://docule.dev/docs) - [Privacy Policy](https://docule.dev/privacy) - [Terms of Service](https://docule.dev/terms)