text-extraction topic
pdftools
Text Extraction, Rendering and Converting of PDF Documents
unidoc
This repository has moved! https://github.com/unidoc/unipdf
datashare
A self-hosted search engine for documents.
nlp
[UNMANTEINED] Extract values from strings and fill your structs with nlp.
CUTIE
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)
ocr
Simple app to extract text from pictures using Tesseract
pd3f
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
extend
Entity Disambiguation as text extraction (ACL 2022)
php-apache-tika
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
PDFIO.jl
PDF Reader Library for Native Julia.