npm-pdfreader icon indicating copy to clipboard operation
npm-pdfreader copied to clipboard

Fails when uploading file that contains comments within PDF

Open lauzonryd opened this issue 3 years ago • 8 comments

When uploading and processing a PDF that contain comments, pdfreader is unable to handle the request, and my backend node service fails. I'm able to use PdfReader().parseBuffer(file, function(err, item) to process the buffered file, and it's able to read the file and first item, but it fails going forward.

Is this a known bug, and if so, is there anyway I can handle this accordingly, or a way to detect the file has comments and return an error. I've tried some work arounds, but the service just fails every time.

lauzonryd avatar Aug 19 '21 15:08 lauzonryd

Thanks for reporting this issue, Ryan.

In order to find a solution, our community needs a more precise description of the problem (e.g. error message) and, if possible, a way to reproduce the issue. Can you share more details and provide a PDF file that fails to parse with pdfreader, please?

adrienjoly avatar Aug 19 '21 17:08 adrienjoly

I was able to diagnose the exact situation in which this is failing. I am attaching two PDFs here. It appears that when a comment is added onto a table, the server crashes. There is no error message, it just stops working completely and the whole server needs restarted. If there was an error message I would be able to handle it accordingly.

testingWithTable.pdf - this PDF fails testingWithTable-noComment.pdf - this PDF does not fail

testingWithTable.pdf testingWithTable-noComment.pdf

lauzonryd avatar Aug 19 '21 19:08 lauzonryd

Awesome, thank you Ryan!

I’ll try to find some time to reproduce in the coming days. If anybody wants to jump in and help us troubleshoot, please feel free to do so.

Until then, don’t hesitate to try opening that file directly with pdf2json and pdf.js, the libraries that pdfreader use under the hood, to better understand from which layer the problem comes.

Talk to you soon!

Adrien

adrienjoly avatar Aug 20 '21 07:08 adrienjoly

When attempting to open these pdfs with pdf2json, I'm getting the same results. The server is crashing when a comment is placed within a table, and works as expected without comments.

lauzonryd avatar Aug 20 '21 13:08 lauzonryd

Interesting… thank you for sharing this update!

I guess that the issue is worth reporting to pdf2json’s GitHub page, then, if you feel like it.

After that get fixed, I’ll update the dependency and release a new version of pdfreader.

Best regards,

Adrien

adrienjoly avatar Aug 22 '21 08:08 adrienjoly

FYI, I just published a new version of pdfreader that uses the last version of pdf2json: Release v1.2.11 · adrienjoly/npm-pdfreader

adrienjoly avatar Aug 27 '21 07:08 adrienjoly

Thanks, I ran the update and tried again but the issue persists. I posted an issue in within the pdf2json repo but I haven't gotten any response.

lauzonryd avatar Aug 27 '21 12:08 lauzonryd

For reference, here's the issue you opened on pdf2json's repo: https://github.com/modesty/pdf2json/issues/242

I'm guessing that it was fixed in pdf2json v1.2.5. Unfortunately, that release introduced a breaking change which is not (yet) supported by pdfreader. (see #95)

In you feel like it, feel free to propose a Pull Request to make pdfreader support that version of pdf2json.

adrienjoly avatar Nov 11 '21 10:11 adrienjoly