PDFLayoutTextStripper icon indicating copy to clipboard operation
PDFLayoutTextStripper copied to clipboard

multi-line cells

Open martinszy opened this issue 7 years ago • 3 comments

how well does this library handles wrapped text inside cells, are they read as a different row?

there's another project that looks to solve this issue: https://github.com/tabulapdf/tabula

martinszy avatar Feb 25 '17 05:02 martinszy

Look at the first use case in the README where I used it to get a txt table of bus schedule. I didn't know about tabula. Thanks for sharing it.

JonathanLink avatar Feb 25 '17 11:02 JonathanLink

I've just tried tabula and it's very good for extracting only tables while with my class one can extract everything (forms, table, paragraphs, etc..)

JonathanLink avatar Feb 25 '17 13:02 JonathanLink

In the example, the third cell of the first row is misrepresented as two rows instead of a long text on the first one. Tabula solves that, maybe you can copy parts of their code.

martinszy avatar Feb 25 '17 15:02 martinszy