extract-text topic

List extract-text repositories

cat

90
Stars
18
Forks
Watchers

Extract text from plaintext, .docx, .odt and .rtf files. Pure go.

open-semantic-etl

252
Stars
68
Forks
Watchers

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelin...

textract

1.6k
Stars
186
Forks
Watchers

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

pd3f

277
Stars
35
Forks
Watchers

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

fulltext

270
Stars
46
Forks
Watchers

:warning: ARCHIVED :warning: Search across and get full text for OA & closed journals

tikaondotnet

193
Stars
74
Forks
Watchers

Use the Java Tika text extraction library on the .NET platform

PDFs-TextExtract

127
Stars
64
Forks
Watchers

Multiple and Large PDF Documents Text Extraction.

rtika

54
Stars
8
Forks
Watchers

R Interface to Apache Tika

pdf-to-text

76
Stars
33
Forks
Watchers

Read pdf files on javascript

tokyo

18
Stars
0
Forks
Watchers

tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.