Upload PDF, Word, Excel or email files and receive structured data as Excel, CSV, JSON or UBL. Process up to 10 files at once. Six extraction types: invoices, floor plans, forms, tables, emails and legal documents.
Choose the type of data you want to pull from your PDFs.
Extract room names, dimensions, and areas from architectural drawings and blueprints.
Extract line items, totals, tax amounts, and vendor information from invoices.
Extract field names, values, checkboxes, and sections from filled-in forms.
Extract sender, recipients, subject, and body from archived emails.
Extract tabular data with automatic header detection and data typing.
Analyse contracts and agreements: parties, clauses, key terms, and risk flags.
Define your own extraction rules with AI-powered prompts.
Three steps from document to structured data.
Upload PDF, Word, Excel or email files. Up to 10 files at once, maximum 50 MB per file.
The document is analysed and relevant fields are detected. Email files are parsed directly, other document types are processed visually.
Export as Excel, CSV, JSON or UBL. Each file type returns the fields specific to that document type.
# Extract invoice data from PDF
curl -X POST https://api.pdfen.com/v2/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@invoice.pdf" \
-F "type=invoice" \
-F "format=json"
# Response
{
"vendor": "Acme Corp B.V.",
"invoice_number": "INV-2025-0042",
"total": 1512.50,
"vat": 262.50,
"line_items": [...]
}
Integrate data extraction into your own application. Submit files via the REST API and receive structured data back as JSON.
Process multiple files per API call. Supports PDF, Word, Excel and email.
Receive a callback when extraction is complete — no polling needed.
Get results as JSON, CSV, Excel or UBL (for invoices).
Export your extracted data in the format that fits your workflow.
Spreadsheet-ready with formatting and multiple sheets.
Universal format for databases and data tools.
Structured data for APIs and applications.
Enterprise format for system integrations.
European e-invoicing standard (EN 16931).
You can upload PDF, Word (.doc, .docx), Excel (.xls, .xlsx) and email files (.eml, .msg). Word, Excel and email files are automatically converted for processing. Email files are parsed directly without conversion. Up to 10 files at once, maximum 50 MB per file.
It depends on the extraction type. Email extraction is 100% accurate because files are parsed directly. PDF forms with built-in fields (AcroForm) are also 100% accurate. For visual extraction (invoices, tables, legal), each extracted field includes a confidence score (high/medium/low) so you can see where manual review may be useful.
Visual extraction works with documents in all common languages, including Dutch, English, German, French and Spanish. Extracted data fields are standardised regardless of the source language. Email extraction is language-independent.
Costs vary by extraction type. PDF forms with AcroForm fields cost 1 credit. Invoices and tables cost 2 credits per PDF (3 per Word/Excel file). Legal documents cost 3 credits per PDF (4 per Word file). Emails cost 2 credits per file. New users receive 15 free credits on registration.
Yes. Via the REST API you can submit files and receive structured results as JSON. Webhooks notify you when processing is complete. The API supports all six extraction types and all file formats.
Create a free account and receive 15 credits to get started.