archive-hocr-tools icon indicating copy to clipboard operation
archive-hocr-tools copied to clipboard

Efficient hOCR tooling

Results 6 archive-hocr-tools issues
Sort by recently updated
recently updated
newest added

If your confidence is not a whole number then parsing it throws an Exception at line 186 of parse.py ``` Traceback (most recent call last): File "/Users/jaredwhiklo/www/DAM/scripts/archive-pdf-tools/bin/recode_pdf", line 302, in...

Many of the tools currently cannot work in special files in `/dev/stdin` in bash, or in general accept files from `stdin`, this is because of some unnecessary seeks. Additionally, it...

This commit handles cases where no `pageType` is detected by skipping the page.

This commit adds support for converting to two characters ISO 639 Part2b languages, e.g. `fre` for French rather than the Part3 `fra`. IA items will often include `fre`, `ger`, etc.,...

This PR adds two commits to address two separate `epubcheck` validation error. The first relates to the mediatype (and HTML escaping), and the second relates to the table of contents....

This commit uses the item `identifier` as the book title if the item is lacking a `title` in its metadata. The DASIY spec requires a title: https://daisy.org/activities/standards/daisy/daisy-3/z39-86-2005-r2012-specifications-for-the-digital-talking-book/