Robert Sachunsky

Results 272 issues of Robert Sachunsky

In the OCR-D workflow, there are several steps that likely require input or output to be able to represent __word segmentation ambiguity__ and confidence values of word boundaries (whitespace characters):...

The current [specification](https://ocr-d.github.io/glossary#OCR) is agnostic about which **level of segmentation** OCR is supposed to operate on, either `TextLine` layout input (for `TextLine`, `Word` or `Glyph` text output), or `Word` layout...

I tried to run the `selected_page_ocr` workflow on `reichsanzeiger_random_selected_pages_ocr` (removed all filegroups except OCR-D-IMG and OCR-D-GT-SEG-LINE to start with) and encountered a different problem (using latest ocrd/all:maximum image): ``` 15:51:21.928...

https://github.com/OCR-D/ocrd_tesserocr/blob/ac274656bfb7021ccd752bccd947e53403b4a909/ocrd_tesserocr/recognize.py#L370-L375

Currently all we get is bounding boxes, which for historic print often overlap heavily. Tesseract internally of course "knows" (already decided) which component belongs to which line, but how do...

enhancement

enhancement
help wanted

When using `ocrd-tesserocr-segment-region` with `crop_polygons=True`, one will frequently get coordinates extending the segment bbox, which could easily end up in negative coordinates (which is forbidden syntactically in PAGE-XML). So maybe...

Some early fixes for the recent #77 – sorry to get back to you so soon @finkf (and thanks for merging so fast). BTW, if you contemplate making a new...

We still have `calamari_ocr == 0.3.5` in `install_requires`. That's quite a drag IMO: it is quite out-dated, pulls in Tensorflow/Keras (themselves outdated), and probably not really necessary (since we've had...

As outlined a while ago, https://github.com/cisocrgroup/ocrd_cis/blob/c3fad1a8b04dc5a305460e8bb3c54cb79cd75515/ocrd_cis/ocropy/common.py#L111-L118 there are plenty of opportunities to improve `ocrd-cis-ocropy-deskew`: - downscale during estimation when pixel density (or resolution) is large - use hill-climbing approach with...