Extract Text from PDF – Free Online Tool
Extract text content from PDF files. Copy text from PDF documents easily without any software.
Example Output
Extracted text in reading order with optional page markers. Works for text-based PDFs; scanned PDFs need OCR first.
whitepaper.txt — full UTF-8 text, paragraphs preserved, page numbers as `--- Page N ---`
What is Extract Text from PDF?
Extract all text from a PDF into a clean .txt or .md file — paragraphs in reading order, optional page-break markers, and UTF-8 throughout so non-Latin scripts (Vietnamese, CJK, Arabic) come out intact. Ideal for feeding LLMs, building a searchable archive, or copying content out of a locked PDF.
Why use this tool?
- Instant results — no waiting on a server or upload progress bar
- Touch-friendly UI, fine on phones for on-the-go edits
- No registration, account, or installation required
- Auto-detects encoding (UTF-8, Shift_JIS, GBK, Vietnamese) for CSV imports
- No file upload — confidential reports never leave your computer
How to use
- Upload your PDF file
- Click "Extract Text"
- View and copy extracted text
- Download as TXT file
Examples
LLM context preparation
A 200-page report becomes a token-efficient .txt you can paste into Claude/GPT for summarisation.
Searchable research archive
Extract text from hundreds of academic PDFs to make the whole library `grep`-able.
Bypassing copy restrictions
Some PDFs disable copy-paste; extraction reads the underlying text stream regardless (you still own the file or have rights).
Common use cases
- Feeding PDFs to LLMs for summarisation/QA
- Building searchable text corpora from PDFs
- Migrating PDF content into a CMS
- Cross-referencing facts across multiple documents
- Translating large PDFs (paste text into a translator)
Troubleshooting
- Output is empty or gibberish
- The PDF is scanned images, not text. Use an OCR tool first (Tesseract or an OCR-capable PDF tool), then re-extract.
- Two-column layouts mix lines together
- Enable "respect columns" mode — the default linear extraction can interleave columns. Column mode walks each column top-to-bottom first.
- Vietnamese / CJK characters are mojibake
- The PDF embeds the font but uses custom encoding. Toggle "use Unicode mapping" (cmap-aware) — most modern PDFs ship a `/ToUnicode` table.
Frequently Asked Questions
No — without OCR, scanned PDFs contain no extractable text, just images. Run the PDF through an OCR tool first to make the text machine-readable.
Try these related tools
Explore more Office Tools
Discover other free, privacy-first tools in Office Tools.