add hocr-svg
explore editing hocr files via svg files
aka: hocr2svg | inkscape | svg2hocr
status: early draft
problems:
at least with inkscape, the groups make it hard to edit text i have to click 14 times to edit a text node in the "layers and objects" widget, groups are sorted in reverse order
adding new words requires adding separate bbox and text nodes and putting them in a group node
so at least for now, this is not a replacement for a proper hocr-editor (gImageReader has an embedded hocr-editor)
explore editing hocr files via svg files
another use case is:
use hocr2svg for fixed-layout EPUB (FXL EPUB)
with the invisible text layer stored in SVG
and the visible text layer stored in WEBP background images
EPUB 3 supports SVG
either as the entire page (each page = one .svg file)
or embedded inside XHTML (as <svg> elements)
(a modern alternative to searchable PDFs)
keywords: hOCR-to-SVG converter
also added hocr-to-epub-fxl (aka hocr2epubfxl)
to convert hocr files to a fixed-layout epub file (epub-fxl)
all the popular epub readers (okular, thorium-reader, koodo-reader)
fail to render these epub files
okular comes closest, but the images are blurry/pixelated
and i cannot access the transparent text layer
... so i built my own epub reader in html
stored as index.html in the epub file
so users can unzip the epub file and read it in a web browser
example books:
nix package: nur.repos.milahu.archive-hocr-tools