Aug 24 '20 18:08 bertsky

Codecov Report

Merging #144 into master will increase coverage by 0.04%. The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #144      +/-   ##
==========================================
+ Coverage   37.73%   37.77%   +0.04%     
==========================================
  Files           9        9              
  Lines        1023      998      -25     
  Branches      216      212       -4     
==========================================
- Hits          386      377       -9     
+ Misses        565      555      -10     
+ Partials       72       66       -6

Impacted Files	Coverage Δ
ocrd_tesserocr/crop.py	`13.51% <ø> (+0.78%)`	:arrow_up:
ocrd_tesserocr/segment_line.py	`63.63% <ø> (-8.68%)`	:arrow_down:
ocrd_tesserocr/segment_region.py	`53.64% <ø> (+4.21%)`	:arrow_up:
ocrd_tesserocr/segment_table.py	`0.00% <0.00%> (ø)`
ocrd_tesserocr/recognize.py	`47.75% <0.00%> (-1.00%)`	:arrow_down:
ocrd_tesserocr/binarize.py	`22.95% <0.00%> (+1.63%)`	:arrow_up:
ocrd_tesserocr/deskew.py	`17.34% <0.00%> (+1.88%)`	:arrow_up:
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 24b7ced...2b3e8d6. Read the comment docs.

Aug 24 '20 18:08 codecov[bot]

This needs to be tested systematically. I expect to see both degradation and improvement, depending on how hard binarization is. See here for explanation.

Aug 24 '20 18:08 bertsky

or perhaps should be parameterizable.

I thought about that, but at workflow configuration time, you have next to no chance of knowing which is going to be better. (I would guess that only input images which fare well under global Otsu are better off with the change. But we have no automatic indicator of binarization quality yet. In the very least, we should strive for some estimator based on local distribution of connected component statistics.)

But I still hope that we can fix the problem in Tesseract itself.

Aug 25 '20 10:08 bertsky

ocrd_tesserocr
ocrd_tesserocr copied to clipboard

Segmentation on raw images

Codecov Report

ocrd_tesserocr ocrd_tesserocr copied to clipboard

Segmentation on raw images

Codecov Report

ocrd_tesserocr
ocrd_tesserocr copied to clipboard