document-processing topic

List document-processing repositories

dhSegment

368
Stars
115
Forks
Watchers

Generic framework for historical document processing

formkiq-core

94
Stars
15
Forks
Watchers

A full-featured Document Layer for your application, providing the functionality of a flexible document management system, including storage, discovery, processing, and retrieval. Deploys directly int...

tokyo

18
Stars
0
Forks
Watchers

tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.

lyx

35
Stars
7
Forks
Watchers

Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)

proceedings

31
Stars
1
Forks
Watchers

Semantic extraction from conference proceedings.

PDFSegmenter

19
Stars
3
Forks
Watchers

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

awesome-datasets

28
Stars
6
Forks
Watchers

A comprehensive list of annotated training datasets classified by use case.

project-lakechain

61
Stars
7
Forks
Watchers

:zap: Cloud-native, AI-powered, document processing pipelines on AWS.

Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates...