page-xml topic

List page-xml repositories

PdfPig

1.5k
Stars
220
Forks
Watchers

Read and extract text and other content from PDFs in C# (port of PDFBox)

DocumentLayoutAnalysis

530
Stars
59
Forks
Watchers

Document Layout Analysis resources repos for development with PdfPig.

ocr-fileformat

175
Stars
23
Forks
Watchers

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

kraken

661
Stars
122
Forks
Watchers

OCR engine for all the languages

ocr-conversion

71
Stars
3
Forks
Watchers

Conversions between various OCR formats

dinglehopper

55
Stars
12
Forks
Watchers

An OCR evaluation tool