Robert Sachunsky comments

Results 723 comments of


                                            Robert Sachunsky

trafficstars

ocrd-tesserocr-crop: 22.5h processing time

> Perhaps we should open an issue in core for the general scenario of early downsampling (as a derived image) and then re-using that image instead of the original (with...

allow intermediate PAGE annotation for word segmentation ambiguity

@kba Point 3 was about representation of the _output_ of multi-OCR alignment on the line level (not on the page level), so it's about word segmentation (not line segmentation). @finkf...

allow intermediate PAGE annotation for word segmentation ambiguity

Some considerations which might help to swing the decision between a specialised PAGE annotation (with deviating semantics) and a new customised XML format: pro PAGE: 1. We already use it...

allow intermediate PAGE annotation for word segmentation ambiguity

Sorry to get back so late, but this problem seems to be a Gordic knot of sorts. Getting good real-life example data entails having some OCR which can already give...

allow intermediate PAGE annotation for word segmentation ambiguity

Sure! So here is what the above (artificial) example could look like: ```XML m n r i r y v p , o a e y my pay ``` ![dot...

allow intermediate PAGE annotation for word segmentation ambiguity

Thanks, @chris1010010. It is, of course, your decision, but that option would also make it impossible to enforce the new element to be a terminal alternative to line, word and...

allow intermediate PAGE annotation for word segmentation ambiguity

@chris1010010 Dear Christian, too bad, but thanks for detailing your reasons! Do you still want me to separate the 3 runup changesets mentioned above (without the actual lattice extension) and...

allow intermediate PAGE annotation for word segmentation ambiguity

@chris1010010 I now separated the lattice extension proposal from the other purely cosmetic commits and made a [PR](https://github.com/OCR-D/PAGE-XML/pull/6) from the latter. @kba I updated (with forced push) the lattice extension...

allow intermediate PAGE annotation for word segmentation ambiguity

@cneud thanks for bringing this to attention. Yes, I am aware of CITlab's confmat approach/format. (In fact, I have [linked](https://github.com/OCR-D/spec/issues/72#issuecomment-469019453) to it above and mentioned the possibility to extend PAGE...

Required parameter 'level-of-operation'

I am much in favour of a uniform parameter name for this in the spec. I would even argue that usually processors should provide that parameter (with a default) even...