camelot icon indicating copy to clipboard operation
camelot copied to clipboard

PDF sample - any way to improve extraction?

Open igvk opened this issue 1 year ago • 5 comments

Here is an example of PDF that has some incorrectly extracted data (in stream mode): V_1.pdf

V_1

  1. Multi-line text isn't interpreted as such, and as a result it is very sparsely distributed into rows.

  2. Number 50 in the last row and column 4 is moved to the next cell to the right and merged with it.

Is it possible to improve the extraction of this table?

igvk avatar Aug 07 '23 11:08 igvk