text-extraction topic

List text-extraction repositories

aut

133
Stars
33
Forks
Watchers

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

benchmarks

179
Stars
9
Forks
Watchers

Benchmarking PDF libraries

pd3f-core

32
Stars
8
Forks
Watchers

📑 Python Package to reconstruct the original continuous text from PDFs with language models

wagtail_textract

31
Stars
13
Forks
Watchers

Text extraction for Wagtail document search

office-text-extractor

46
Stars
4
Forks
Watchers

Yet another library to extract text from MS Office and PDF files

docwire

52
Stars
12
Forks
Watchers

DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boo...

scummtr

28
Stars
4
Forks
Watchers

Fan translation tools for LucasArts SCUMM games

pdf-text-data-extractor

67
Stars
41
Forks
Watchers

PDF text data extraction web app with OCR for scanned documents

video_text_detection

15
Stars
3
Forks
Watchers

Bachelor Thesis | Text extraction from complex video scenes

tesseract-ocr-wrapper

17
Stars
4
Forks
Watchers

This is a highly efficient python wrapper for tesseract-ocr.