tabula-java icon indicating copy to clipboard operation
tabula-java copied to clipboard

Extraction of tables might include digital watermark

Open skwskwskwskw opened this issue 2 years ago • 1 comments

I am working on a PDF file which might include watermark when extracting the table. The watermark might occur at different locations. 2 approaches I am thinking but I am not sure how to approach it:

  1. Dont extract words that are rotated.
  2. When extracting, it should be absolute location of watermark as seen on PDF - but the tabula defined the watermark at different location.

The watermark looks like this (the number that is rotated):

image

skwskwskwskw avatar Feb 20 '23 16:02 skwskwskwskw

Hey, just wondering if you managed to find a solution/ workaround for the problem? I have a similar PDF that have a text watermark at the side too

germainepym avatar Jul 28 '23 03:07 germainepym