layout-analysis topic
PdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
kraken
OCR engine for all the languages
PdfPigMLNetBlockClassifier
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and...
PDFSegmenter
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
detectron2-publaynet
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
publaynet-models
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
SelfDocSeg
[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)
HJDataset
A Large Dataset of Historical Japanese Documents with Complex Layouts