text-extraction topic

List text-extraction repositories

pdftools

501
Stars
69
Forks
Watchers

Text Extraction, Rendering and Converting of PDF Documents

unidoc

705
Stars
87
Forks
Watchers

This repository has moved! https://github.com/unidoc/unipdf

datashare

555
Stars
50
Forks
Watchers

A self-hosted search engine for documents.

nlp

388
Stars
34
Forks
Watchers

[UNMANTEINED] Extract values from strings and fill your structs with nlp.

CUTIE

157
Stars
79
Forks
Watchers

CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

ocr

103
Stars
8
Forks
Watchers

Simple app to extract text from pictures using Tesseract

pd3f

277
Stars
35
Forks
Watchers

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

extend

166
Stars
10
Forks
Watchers

Entity Disambiguation as text extraction (ACL 2022)

php-apache-tika

111
Stars
22
Forks
Watchers

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats

PDFIO.jl

124
Stars
13
Forks
Watchers

PDF Reader Library for Native Julia.