pdf-to-text topic
papercast
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines...
ocr-python
OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
adobe-pdf-library-samples
Sample code for the Datalogics C++, Java, and .NET interfaces of the Adobe PDF Library
pdf-text-data-extractor
PDF text data extraction web app with OCR for scanned documents
API-Tabua-Mare
[Eng] API for obtaining data from the Tide Table, using web scraping. [Pt-Br] API para Obtenção da Tábua de Maré diária, usando web scraping com PHP.
pdf-to-txt-python
Simple pdf to text with python using PDFtk and PyPDF2
ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.