document-analysis topic
PdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
robin
RObust document image BINarization
Curve-Text-Detector
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.
PICK-pytorch
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
pandora
Pandora is an analysis framework to discover if a file is suspicious and conveniently show the results
local_adaptive_binarization
Local adaptive image binarization
docExtractor
(ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical document element extraction" paper
docvqa
Document Visual Question Answering
LiLT
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)