Robert Sachunsky comments

Results 735 comments of


                                            Robert Sachunsky

Step 14: tesseract model GT4HistOCR_50000000.997_191951

already mentioned in #297 BTW. But ideally, we should recommend nothing other than frak2021

GPU processors: (uniformly) log CPU vs GPU use

> e.g. in my environment I can only "believe" that `ocrd-kraken-segment` uses the GPU. BTW I just amended https://github.com/OCR-D/ocrd_kraken/pull/38 to include such warnings for Kraken.

segmentation: tesseract5.3.0 vs ocrd/all:2022-08-15

Hard to tell from a diff tool that I don't know and data I cannot see. Looks like in ocrd-tesserocr two lines are duplicate. Binarization will have an impact, yes...

segmentation: tesseract5.3.0 vs ocrd/all:2022-08-15

thanks @jbarth-ubhd for checking thoroughly! BTW, if you want to try any of the better Calamari 2 models [here](https://github.com/Calamari-OCR/calamari_models_experimental) and [there](https://github.com/OCR4all/ocr4all_models) (probably also [here](https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/calamari/AustrianNewspapers/)), you currently have to switch to...

segmentation: tesseract5.3.0 vs ocrd/all:2022-08-15

> But the error message `ValueError: bad marshal data (unknown type code)` is the same for deep3... Ouch. With Python>=3.8 we are now heavily being hit by https://github.com/Calamari-OCR/calamari/issues/78. The solution...

segmentation: tesseract5.3.0 vs ocrd/all:2022-08-15

> on manually cropped [...] but perspective **not** corrected (very minimal perspective distortion) > > Note that there are more errors than in the first ocr-comparison-Image here Hard to tell....

segmentation: tesseract5.3.0 vs ocrd/all:2022-08-15

> > ok `ocrd-tesserocr-recognize -I OCR-D-IMG -O OCR-D-OCR6 -P segmentation_level region -P textequiv_level word -P find_tables true -P overwrite_segments true -P model Fraktur_GT4HistOCR` gives exactly the same results as tesseract5.3.0...

Docker: interference with older versions of core

Also, some sub-submodules (e.g. ocrd_fileformat → ocr-fileformat → textract2page) will pull ocrd via PyPI.

Docker: interference with older versions of core

> Also, some sub-submodules (e.g. ocrd_fileformat → ocr-fileformat → textract2page) will pull ocrd via PyPI. - because they may depend on ocrd in certain version ranges incompatible with the base...

Release version 0.3.0 and 1.0.0

I'd recommend going through the open issues once more and deciding/labelling whether any of those are important enough to block a v1.0. Perhaps Travis CI can be updated and extended...