# Docule

> PDF parsing API that converts complex documents into structured Markdown, JSON, and plain text. Specializes in financial tables, annual reports, invoices, and multi-column layouts.

Docule is a document intelligence API at [docule.dev](https://docule.dev). It parses PDFs into clean, structured output using a multi-stage processing pipeline.

## What Docule Does

- Converts PDF documents to structured Markdown, JSON, or plain text
- Extracts complex financial tables with correct column alignment and hierarchy
- Handles side-by-side balance sheets, 12-column segment reports, nested sub-totals
- Parses invoices with line items, totals, VAT, and vendor metadata
- Processes annual reports, bank statements, contracts, sustainability reports, SEC filings
- Provides RAG-ready page-level chunks with keyword metadata
- Validates extracted data: column consistency, header detection, sum verification

## Key Features

- **Adaptive pipeline**: Simple pages use fast text extraction; complex layouts automatically escalate to higher-fidelity processing
- **Financial table specialization**: Built on the hardest parsing task — income statements, balance sheets, cash flows
- **Multiple output formats**: Markdown for retrieval/reading, JSON with strict content-first defaults plus optional metadata toggles
- **Batch processing**: Submit thousands of documents asynchronously at reduced cost
- **Built-in validation**: Pages that fail quality checks are re-processed at higher fidelity automatically

## API

REST API with JSON responses. Authentication via `X-API-Key` header. Base URL: `https://docule.dev/api/v1`.

- `POST /api/v1/parse` — Upload a PDF, receive a `job_id` for async processing
- `GET /api/v1/status/{job_id}` — Check parse job status and progress
- `GET /api/v1/result/{job_id}` — Retrieve the parsed result (json, md, or txt)
- `GET /api/v1/result/{job_id}/raw` — Download the raw output file
- `GET /api/v1/jobs` — List recent jobs for the authenticated account
- `GET /api/v1/health` — Service health (no auth required)

JSON responses default to strict content export (`pages[].text`, `pages[].markdown`, `pages[].items`) with optional query toggles:
- `include_source_path`
- `include_document_metadata`
- `include_page_metadata`
- `include_item_metadata`
- `include_processing_metadata`

Markdown/text responses support optional page boundaries via `include_page_markers=true`.

## Supported Document Types

Annual reports, invoices, bank statements, contracts, sustainability reports, research papers, tax filings, insurance claims, loan applications, medical records, SEC filings, board materials.

## Pricing

Credits-based pricing with a free tier. See [docule.dev](https://docule.dev) for current plans.

## Links

- [Documentation](https://docule.dev/docs)
- [API Reference](https://docule.dev/docs)
- [Privacy Policy](https://docule.dev/privacy)
- [Terms of Service](https://docule.dev/terms)