archive-hocr-tools icon indicating copy to clipboard operation
archive-hocr-tools copied to clipboard

add hocr-svg

Open milahu opened this issue 5 months ago • 1 comments

explore editing hocr files via svg files aka: hocr2svg | inkscape | svg2hocr

status: early draft

problems:

at least with inkscape, the groups make it hard to edit text i have to click 14 times to edit a text node in the "layers and objects" widget, groups are sorted in reverse order

adding new words requires adding separate bbox and text nodes and putting them in a group node

so at least for now, this is not a replacement for a proper hocr-editor (gImageReader has an embedded hocr-editor)

explore editing hocr files via svg files

another use case is: use hocr2svg for fixed-layout EPUB (FXL EPUB) with the invisible text layer stored in SVG and the visible text layer stored in WEBP background images EPUB 3 supports SVG either as the entire page (each page = one .svg file) or embedded inside XHTML (as <svg> elements) (a modern alternative to searchable PDFs)

keywords: hOCR-to-SVG converter

milahu avatar Aug 18 '25 11:08 milahu

also added hocr-to-epub-fxl (aka hocr2epubfxl) to convert hocr files to a fixed-layout epub file (epub-fxl)

all the popular epub readers (okular, thorium-reader, koodo-reader) fail to render these epub files okular comes closest, but the images are blurry/pixelated and i cannot access the transparent text layer ... so i built my own epub reader in html stored as index.html in the epub file so users can unzip the epub file and read it in a web browser

example books:

nix package: nur.repos.milahu.archive-hocr-tools

milahu avatar Oct 23 '25 23:10 milahu