camelot PDF sample - any way to improve extraction?

PDF sample - any way to improve extraction?

Open igvk opened this issue 1 year ago • 5 comments

Here is an example of PDF that has some incorrectly extracted data (in stream mode): V_1.pdf

V_1

Multi-line text isn't interpreted as such, and as a result it is very sparsely distributed into rows.
Number 50 in the last row and column 4 is moved to the next cell to the right and merged with it.

Is it possible to improve the extraction of this table?

Aug 07 '23 11:08 igvk