ocr-conversion
ocr-conversion copied to clipboard
Conversions between various OCR formats
OCR conversion
Collection of scripts and stylesheets for conversion between various OCR formats.
You may also want to check out the excellent ocr-fileformat by @UB-Mannheim.
ABBYY
-
abbyy2hocr.xsl
- ABBYY FineReader XML to hOCR converter @Rod Page -
abbyy2hocr.xsl
- ABBYY FineReader XML to hOCR converter by @Rod Page - updated by @OCR-D -
abbyy-to-hocr
- ABBYY FineReader XML to hOCR converter by @merlijn -
teip5-v5.xsl
- Transform ABBYY Finereader XML into TEI @UPEI -
ABBYY_to_TEI_by_XMLReader.php
- Convert ABBYY XML to TEI using PHP's XMLReader @able-project -
ocr_to_teifacsimile.xsl
- Generate page-level TEI facsimile from Abbyy OCR xml or METS/ALTO @readux -
AbbyyToAlto.php
- PHP5 to convert Abbyy FineReader XML into ALTO XML @ironymark -
AbbyyToAltoConverter.java
- Java library to convert abbyy.xml (v10) to alto.xml (v2) @abbyy-to-alto
ALTO
-
alto2tei.xsl
- Output TEI from ALTO input format @OpenConvert -
AltoToTeiA.xsl
- For Gale OCR XML or 18thConnect Typewright XML files @typewright -
ocr_to_teifacsimile.xsl
- Generate page-level TEI facsimile from Abbyy OCR xml or METS/ALTO @readux -
alto2hocr.xsl
- Convert ALTO 2.0 / ALTO 2.1 to hOCR @filak -
alto2text.xsl
- Convert ALTO 2.0 / ALTO 2.1 to plain text @filak -
alto_ocr_text.py
- Extracts the text from an ALTO file and writes it to stdout @cneud -
ALTO2HTML.bat
- Batch script to convert ALTO files to HTML @altomator -
dinglehopper-extract
- Extracts the text from ALTO and PAGE XML files @qurator-spk
hOCR
-
hOCR2ALTO.xsl
- Utilities to process and handle hOCR @ONB-RD -
hocr2alto2.0.xsl
- Convert hOCR to ALTO 2.0 @filak -
hocr2alto2.1.xsl
- Convert hOCR to ALTO 2.1 @filak -
hocr2tei.xsl
- Convert hOCR from Tesseract to basic TEI output @DH2015 -
hocr2tei.xsl
- Convert hOCR from Tesseract to basic TEI output from @DH2015 - updated by @OCR-D -
hocr2text.xsl
Convert hOCR to plain text @filak -
HocrConverter.py
- Create a PDF from an hOCR file and an image @jbrinley
PAGE
-
PageConverter.java
- Convert ALTO XML, FineReader XML, Google CV, and hOCR to the latest PAGE XML format @prima -
xml_to_box.xsl
- Convert PAGE XML to Tesseract box file @eMOP -
page_to_text.py
- Extracts the text from a PAGE file and writes it to stdout @cneud -
PageToPdfConverter.java
- Convert PAGE XML files with layout and text content to PDF @prima -
page2tei-0.xsl
- Convert PAGE XML to TEI @dariok -
PageToAlto.xsl
- Convert PAGE XML to ALTO @Transkribus -
page-to-alto
– Convert PAGE XML to ALTO (all versions) @kba -
dinglehopper-extract
- Extracts the text from ALTO and PAGE XML files @qurator-spk
TEI
-
tei2txt.xsl
- Convert DTA TEI-P5 to plain text @haoess -
tei2hocr.xsl
- Convert DTA TEI-P5 to hOCR @jbaiter
Other
-
iw2alto.xsl
- Convert ImageWare MyBib eL OCR to ALTO @karkraeg -
transkribus-xslt
- Various stylesheets from Transkribus @readcoop -
transkribus-to-prima
– Convert Transkribus dialect to official PAGE XML format @kba -
textract2page
- Convert Amazon AWS Textract to PAGE XML @slub -
gcv2hocr
– Convert Google Cloud Vision to hOCR @dinosauria123