pdf2json icon indicating copy to clipboard operation
pdf2json copied to clipboard

pdf2json Performance over large PDF

Open barneydunning opened this issue 8 years ago • 8 comments

Hi All,

I have a PDF file that contains about 500 pages (3.6mb) - I can't post because it contains sensitive data. When I load it up through pdf2json, it takes about 10 minutes to fire the dataReady callback... is this expected?

I am running the node application on an macbook pro, i7, 16GB... and seriously expected it to be faster.

The PDF contents are of a timetable nature... and all I want to extract are the text strings and their x/y locations for grouped by page.

Does anyone else have performance issues with pdf2json... or does anyone else have any suggestions as to other node modules to use for this purpose?

Looking forward to some help... and free to answer any questions.

Ta.

barneydunning avatar Jun 22 '16 17:06 barneydunning