document-processing topic
dhSegment
Generic framework for historical document processing
pandoc-include
An include filter for Pandoc
formkiq-core
A full-featured Document Layer for your application, providing the functionality of a flexible document management system, including storage, discovery, processing, and retrieval. Deploys directly int...
tokyo
tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.
lyx
Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)
proceedings
Semantic extraction from conference proceedings.
PDFSegmenter
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
awesome-datasets
A comprehensive list of annotated training datasets classified by use case.
project-lakechain
:zap: Cloud-native, AI-powered, document processing pipelines on AWS.
enhanced-document-understanding-on-aws
Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates...