text-extraction topic
List
text-extraction repositories
pdftools
501
Stars
69
Forks
Watchers
Text Extraction, Rendering and Converting of PDF Documents
unidoc
705
Stars
87
Forks
Watchers
This repository has moved! https://github.com/unidoc/unipdf
datashare
555
Stars
50
Forks
Watchers
A self-hosted search engine for documents.
nlp
388
Stars
34
Forks
Watchers
[UNMANTEINED] Extract values from strings and fill your structs with nlp.
CUTIE
157
Stars
79
Forks
Watchers
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)
ocr
103
Stars
8
Forks
Watchers
Simple app to extract text from pictures using Tesseract
pd3f
277
Stars
35
Forks
Watchers
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
extend
166
Stars
10
Forks
Watchers
Entity Disambiguation as text extraction (ACL 2022)
php-apache-tika
111
Stars
22
Forks
Watchers
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
PDFIO.jl
124
Stars
13
Forks
Watchers
PDF Reader Library for Native Julia.