jbarlow comments

Results 380 comments of


jbarlow

mzucker's Noteshrink integration

That looks really interesting. This is the sort of thing I'd like to use the forthcoming plugin architecture for, if/when I get a chance to finish it.

mzucker's Noteshrink integration

Yes, you could write a plugin that hooks `filter_page_image` for example.

hocr import / export

I think this would be great but there's a lot to do to make it work, especially to support after the fact editing.

hocr import / export

ocrmypdf already has the ability to merge hOCR HTML into PDF through its public APIs. What it does not have is a convenient way to run its post-processing on a...

hocr import / export

@tukusejssirs The relevant code is in hocrtransform.py. See `python -m ocrmypdf.hocrtransform --help`.

hocr import / export

No, it doesn't have that ability, but you could split the hOCR and run a loop.

hocr import / export

It looks like the XML (`024_hocr.html`) is invalid, specifically at line 45.

hocr import / export

ocrmypdf.hocrtransform is only capable of parsing the subset of hOCR generated by Tesseract. For this specific case, you'll need to add a string like the following to the top of...

hocr import / export

(Note that doctype signature may actually be incorrect for hOCR; whatever the hOCR spec says is correct should be used.)

hocr import / export

Official definition is ```xml ``` From: https://www.w3.org/TR/html4/sgml/entities.html