pdf2json
pdf2json copied to clipboard
pdf2json Performance over large PDF
Hi All,
I have a PDF file that contains about 500 pages (3.6mb) - I can't post because it contains sensitive data. When I load it up through pdf2json, it takes about 10 minutes to fire the dataReady callback... is this expected?
I am running the node application on an macbook pro, i7, 16GB... and seriously expected it to be faster.
The PDF contents are of a timetable nature... and all I want to extract are the text strings and their x/y locations for grouped by page.
Does anyone else have performance issues with pdf2json... or does anyone else have any suggestions as to other node modules to use for this purpose?
Looking forward to some help... and free to answer any questions.
Ta.