ocrd_tesserocr
ocrd_tesserocr copied to clipboard
Segmentation on raw images
Codecov Report
Merging #144 into master will increase coverage by
0.04%. The diff coverage is0.00%.
@@ Coverage Diff @@
## master #144 +/- ##
==========================================
+ Coverage 37.73% 37.77% +0.04%
==========================================
Files 9 9
Lines 1023 998 -25
Branches 216 212 -4
==========================================
- Hits 386 377 -9
+ Misses 565 555 -10
+ Partials 72 66 -6
| Impacted Files | Coverage Δ | |
|---|---|---|
| ocrd_tesserocr/crop.py | 13.51% <ø> (+0.78%) |
:arrow_up: |
| ocrd_tesserocr/segment_line.py | 63.63% <ø> (-8.68%) |
:arrow_down: |
| ocrd_tesserocr/segment_region.py | 53.64% <ø> (+4.21%) |
:arrow_up: |
| ocrd_tesserocr/segment_table.py | 0.00% <0.00%> (ø) |
|
| ocrd_tesserocr/recognize.py | 47.75% <0.00%> (-1.00%) |
:arrow_down: |
| ocrd_tesserocr/binarize.py | 22.95% <0.00%> (+1.63%) |
:arrow_up: |
| ocrd_tesserocr/deskew.py | 17.34% <0.00%> (+1.88%) |
:arrow_up: |
| ... and 2 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 24b7ced...2b3e8d6. Read the comment docs.
This needs to be tested systematically. I expect to see both degradation and improvement, depending on how hard binarization is. See here for explanation.
or perhaps should be parameterizable.
I thought about that, but at workflow configuration time, you have next to no chance of knowing which is going to be better. (I would guess that only input images which fare well under global Otsu are better off with the change. But we have no automatic indicator of binarization quality yet. In the very least, we should strive for some estimator based on local distribution of connected component statistics.)
But I still hope that we can fix the problem in Tesseract itself.