pdf-to-text topic
pd3f
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
SciTSR
Table structure recognition dataset of the paper: Complicated Table Structure Recognition
PDF-TOOLBOX
A Multi Purpose PDF Toolkit
converter
Standalone .NET Converter library, not require Adobe Acrobat component nor Microsoft Office Interop Assemblies, to convert PDF, DOCX, XLSX, HTML, Image, CSV, RTF, TXT in .NET framework
pdf-text-extraction
cli for extracting text from PDF files (and maybe possibly tables)
Docotic.Pdf.Samples
C# and VB.NET samples for Docotic.Pdf library
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Extract-Data-From-PDF-In-Python
Batch-convert pdf to text, extract data from pdf in python
nocodefunctions-web-app
The code base of the front-end of nocodefunctions.com