camelot
camelot copied to clipboard
[MRG] Add saturation threshold option for low contrast tables
I found out this problem while trying to parse a table with low contrast background color. The -back option didn't work for low contrast areas such as the last row. So, I've added a new option -color (--process_color_background) which increases the contrast to guarantee accurate table parsing.
Here's camelot (master) result:

Here's my branch with -color option enabled:

As you can see, we add another step which is basically a binary threshold for low saturation vs no saturation. Now the borders are way more pronounced and camelot has no issue detecting all the rows.
Codecov Report
Merging #203 into master will decrease coverage by
0.60%. The diff coverage is21.42%.
@@ Coverage Diff @@
## master #203 +/- ##
==========================================
- Coverage 88.26% 87.65% -0.61%
==========================================
Files 14 14
Lines 1542 1555 +13
Branches 350 351 +1
==========================================
+ Hits 1361 1363 +2
- Misses 127 137 +10
- Partials 54 55 +1
| Impacted Files | Coverage Δ | |
|---|---|---|
| camelot/io.py | 100.00% <ø> (ø) |
|
| camelot/utils.py | 81.26% <ø> (ø) |
|
| camelot/image_processing.py | 82.14% <8.33%> (-12.38%) |
:arrow_down: |
| camelot/cli.py | 86.77% <100.00%> (+0.11%) |
:arrow_up: |
| camelot/parsers/lattice.py | 94.14% <100.00%> (+0.03%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update d17dc43...9161ef3. Read the comment docs.
@NoReflex Thanks for the PR! The results look great! I don't have enough image processing and opencv background so will have to read up on the cv2.COLOR_BGR2HSV option. I also have a question, do you think this new code could also handle other background line cases? That way we could just add it as an enhancement to the earlier option instead of creating a new one.
@vinayak-mehta The cv2.COLOR_BGR2HSV is just a colorspace transformation from RGB to HSV (Hue, Saturation, Value).
As for the question, this would fail if the table's cell colors are gray/colorless. That's why it's an option.
Technically it's still an enhancement, because the -color flag can only be used with -back, but yeah, I get your point and I think it's better if it's a separate option for handling edge cases.
EDIT: By failing, I just mean that the result will be worse than using the plain option, the code is pretty bulletproof, it's just some simple numpy array transformations.