pdf-to-text topic

List pdf-to-text repositories

papercast

32
Stars
1
Forks
Watchers

A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines...

ocr-python

74
Stars
11
Forks
Watchers

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.

adobe-pdf-library-samples

80
Stars
62
Forks
Watchers

Sample code for the Datalogics C++, Java, and .NET interfaces of the Adobe PDF Library

pdf-text-data-extractor

67
Stars
41
Forks
Watchers

PDF text data extraction web app with OCR for scanned documents

API-Tabua-Mare

16
Stars
8
Forks
Watchers

[Eng] API for obtaining data from the Tide Table, using web scraping. [Pt-Br] API para Obtenção da Tábua de Maré diária, usando web scraping com PHP.

pdf-to-txt-python

19
Stars
13
Forks
Watchers

Simple pdf to text with python using PDFtk and PyPDF2

ragflow

39.3k
Stars
3.5k
Forks
194
Watchers

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

docling

16.4k
Stars
834
Forks
81
Watchers

Get your documents ready for gen AI