camelot
camelot copied to clipboard
PDF sample - any way to improve extraction?
Here is an example of PDF that has some incorrectly extracted data (in stream mode): V_1.pdf
-
Multi-line text isn't interpreted as such, and as a result it is very sparsely distributed into rows.
-
Number 50 in the last row and column 4 is moved to the next cell to the right and merged with it.
Is it possible to improve the extraction of this table?