extract-text topic
cat
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
open-semantic-etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelin...
textract
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
pd3f
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
fulltext
:warning: ARCHIVED :warning: Search across and get full text for OA & closed journals
tikaondotnet
Use the Java Tika text extraction library on the .NET platform
PDFs-TextExtract
Multiple and Large PDF Documents Text Extraction.
pdf-to-text
Read pdf files on javascript
tokyo
tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.