tabula-java
tabula-java copied to clipboard
Extraction of tables might include digital watermark
I am working on a PDF file which might include watermark when extracting the table. The watermark might occur at different locations. 2 approaches I am thinking but I am not sure how to approach it:
- Dont extract words that are rotated.
- When extracting, it should be absolute location of watermark as seen on PDF - but the tabula defined the watermark at different location.
The watermark looks like this (the number that is rotated):

Hey, just wondering if you managed to find a solution/ workaround for the problem? I have a similar PDF that have a text watermark at the side too