camelot icon indicating copy to clipboard operation
camelot copied to clipboard

[MRG] Add saturation threshold option for low contrast tables

Open NoReflex opened this issue 5 years ago • 4 comments
trafficstars

I found out this problem while trying to parse a table with low contrast background color. The -back option didn't work for low contrast areas such as the last row. So, I've added a new option -color (--process_color_background) which increases the contrast to guarantee accurate table parsing.

Here's camelot (master) result: example_camelot_master

Here's my branch with -color option enabled: example_camelot_branch

As you can see, we add another step which is basically a binary threshold for low saturation vs no saturation. Now the borders are way more pronounced and camelot has no issue detecting all the rows.

NoReflex avatar Oct 23 '20 06:10 NoReflex

Codecov Report

Merging #203 into master will decrease coverage by 0.60%. The diff coverage is 21.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #203      +/-   ##
==========================================
- Coverage   88.26%   87.65%   -0.61%     
==========================================
  Files          14       14              
  Lines        1542     1555      +13     
  Branches      350      351       +1     
==========================================
+ Hits         1361     1363       +2     
- Misses        127      137      +10     
- Partials       54       55       +1     
Impacted Files Coverage Δ
camelot/io.py 100.00% <ø> (ø)
camelot/utils.py 81.26% <ø> (ø)
camelot/image_processing.py 82.14% <8.33%> (-12.38%) :arrow_down:
camelot/cli.py 86.77% <100.00%> (+0.11%) :arrow_up:
camelot/parsers/lattice.py 94.14% <100.00%> (+0.03%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d17dc43...9161ef3. Read the comment docs.

codecov-io avatar Oct 23 '20 07:10 codecov-io

@NoReflex Thanks for the PR! The results look great! I don't have enough image processing and opencv background so will have to read up on the cv2.COLOR_BGR2HSV option. I also have a question, do you think this new code could also handle other background line cases? That way we could just add it as an enhancement to the earlier option instead of creating a new one.

vinayak-mehta avatar Oct 25 '20 00:10 vinayak-mehta

@vinayak-mehta The cv2.COLOR_BGR2HSV is just a colorspace transformation from RGB to HSV (Hue, Saturation, Value).
As for the question, this would fail if the table's cell colors are gray/colorless. That's why it's an option.
Technically it's still an enhancement, because the -color flag can only be used with -back, but yeah, I get your point and I think it's better if it's a separate option for handling edge cases.

EDIT: By failing, I just mean that the result will be worse than using the plain option, the code is pretty bulletproof, it's just some simple numpy array transformations.

NoReflex avatar Oct 26 '20 06:10 NoReflex