Robert Sachunsky

Results 730 comments of Robert Sachunsky
trafficstars

Sorry, I cannot reproduce how you got here. The broken METS is definitely the reason for the cropper to misbehave, which in turn is the reason for second binarizer to...

@mikegerber I think we established that the root cause was an outdated OCR-D version in Quiver, which had a bug that produced broken METS prior to this step. Is that...

This needs to be tested systematically. I expect to see **both** degradation and improvement, depending on how hard binarization is. See [here](https://github.com/tesseract-ocr/tesseract/issues/3083) for explanation.

> or perhaps should be parameterizable. I thought about that, but at workflow configuration time, you have next to no chance of knowing which is going to be better. (I...

The above mentioned issue in ocrd_tesserocr has been closed now (because a first proof-of-concept implementation has been merged there), but the discussion of the open problems, and of adding detail...

Meanwhile, **another related issue** came up: Now that we have the possibility of _implicit output file groups_ in METS-XML via the derived images referenced in PAGE-XML, no workspace engine will...

> Meanwhile, **another related issue** came up: Now that we have the possibility of _implicit output file groups_ in METS-XML via the derived images referenced in PAGE-XML, no workspace engine...

Yet another **open problem** has [surfaced](https://github.com/OCR-D/core/pull/639): When a processor _changes_ coordinates of some existing segment, it must also remove all existing derived images for that segment, because they will be...

Also, this parameter should be called just `polygons` (because it is independent of how cropping is done now).

> Polygons should be the default. I agree, but we still have the issue of Tesseract generating invalid (self-intersecting) polygon paths internally, which end up in very strange ways on...