Robert Sachunsky
Robert Sachunsky
This appears to affect all kinds of regions, but only when they have been rotated internally. Anyway, this is _not_ about clipping to the image/rectangle.
We now have a partial solution in [Tesseract itself](https://github.com/tesseract-ocr/tesseract/pull/2826), but on top of that I still hesitate to make a PR for the convex_hull workaround here...
What if instead of trying to find the bug deep inside Tesseract's polyblk generator we take the liberty of annotating text regions _along with text lines_ in one pass? (Perhaps...
I am not sure we need this as part of the specs anymore. It has been [implemented in core](https://github.com/OCR-D/core/pull/623) already, but since this applies to internal of Python implementations, I...
> We are using this at our groundtruth frame level. "Groundtruth Frame" means, that we almost never transcript a whole page as groundtruth. Most times we use about the half...
@tboenig > The proposal > `TextRegion[@type="other",@custom="subtype:column"]` > > leads us in the right direction, but: > > my suggestion: > `TextRegion[@type="paragraph",@custom="#column"]` > > @type="other" > should be used for regions...
> Algorithms should declare their "level of operation" Where should they do so, in their `ocrd-tool.json` perhaps? How does workflow configuration get to know otherwise? And what if they are...
Thanks @MihoMahi for the report! Does this only happen in the default `textequiv_level=word`, or also with `textequiv_level=glyph`?
Yes, you might want to ignore the glyph level, as it contains alternative OCR hypotheses. But the difference in the word level tells us that the blame is actually on...
@VolkerHartmann Sorry, I am not so sure what it is you are asking me for. This issue is about OCR model meta-data, and I already find the list of features...