Robert Sachunsky comments

Results 730 comments of


                                            Robert Sachunsky

trafficstars

segment-region: crop_polygons creates invalid coordinates

This appears to affect all kinds of regions, but only when they have been rotated internally. Anyway, this is _not_ about clipping to the image/rectangle.

segment-region: crop_polygons creates invalid coordinates

We now have a partial solution in [Tesseract itself](https://github.com/tesseract-ocr/tesseract/pull/2826), but on top of that I still hesitate to make a PR for the convex_hull workaround here...

segment-region: crop_polygons creates invalid coordinates

What if instead of trying to find the bug deep inside Tesseract's polyblk generator we take the liberty of annotating text regions _along with text lines_ in one pass? (Perhaps...

Extract descriptive metadata from docstrings

I am not sure we need this as part of the specs anymore. It has been [implemented in core](https://github.com/OCR-D/core/pull/623) already, but since this applies to internal of Python implementations, I...

Metadata for annotate the type of a groundtruth-dataset

> We are using this at our groundtruth frame level. "Groundtruth Frame" means, that we almost never transcript a whole page as groundtruth. Most times we use about the half...

Metadata for annotate the type of a groundtruth-dataset

@tboenig > The proposal > `TextRegion[@type="other",@custom="subtype:column"]` > > leads us in the right direction, but: > > my suggestion: > `TextRegion[@type="paragraph",@custom="#column"]` > > @type="other" > should be used for regions...

OCR on line vs word level

> Algorithms should declare their "level of operation" Where should they do so, in their `ocrd-tool.json` perhaps? How does workflow configuration get to know otherwise? And what if they are...

reverse order of glyphs inside words in PAGE-File for RTL languages

Thanks @MihoMahi for the report! Does this only happen in the default `textequiv_level=word`, or also with `textequiv_level=glyph`?

reverse order of glyphs inside words in PAGE-File for RTL languages

Yes, you might want to ignore the glyph level, as it contains alternative OCR hypotheses. But the difference in the word level tells us that the blame is actually on...

Metadata for OCR models and/or OCR model training sets

@VolkerHartmann Sorry, I am not so sure what it is you are asking me for. This issue is about OCR model meta-data, and I already find the list of features...