PDFLayoutTextStripper
PDFLayoutTextStripper copied to clipboard
multi-line cells
how well does this library handles wrapped text inside cells, are they read as a different row?
there's another project that looks to solve this issue: https://github.com/tabulapdf/tabula
Look at the first use case in the README where I used it to get a txt table of bus schedule. I didn't know about tabula. Thanks for sharing it.
I've just tried tabula and it's very good for extracting only tables while with my class one can extract everything (forms, table, paragraphs, etc..)
In the example, the third cell of the first row is misrepresented as two rows instead of a long text on the first one. Tabula solves that, maybe you can copy parts of their code.