npm-pdfreader icon indicating copy to clipboard operation
npm-pdfreader copied to clipboard

Unable to catch Parse error

Open Ggabytes opened this issue 3 years ago • 2 comments

Hi @adrienjoly ,

I am using pdfreader to parse ### pdf documents. However in my application if I bump into runtime error while parsing pdf I want to use a particular logic. Below is code and trace of exception while reading pdf document. Issue is that the error is not getting caught in if(err) condition. Am I missing anything in catching the exception shown below?

Thanks, Ji

Code snippet and exception trace:

function readPDFPages(buffer, reader = (new PdfReader())) {

  console.log('reading pdf pages: ');
  console.log(buffer);

  return new Promise((resolve, reject) => {
    let pages = [];
    reader.parseBuffer(buffer, (err, item) => {

      if (err) {
        console.log("err in parsed buffer");
        console.log(err);
        reject(err)
      }
      else if (!item)
        resolve(pages);

      else if (item.page)
        resolve(pages);
    });
  });

}

Exception trace:
Error: Illegal character: 41
    at error (eval at <anonymous> (/var/task/node_modules/pdf2json/lib/pdf.js:64:1), <anonymous>:195:9)
    at Lexer_getObj [as getObj] (eval at <anonymous> (/var/task/node_modules/pdf2json/lib/pdf.js:64:1), <anonymous>:24616:11)
    at Parser_shift [as shift] (eval at <anonymous> (/var/task/node_modules/pdf2json/lib/pdf.js:64:1), <anonymous>:24038:32)
    at Parser_makeStream [as makeStream] (eval at <anonymous> (/var/task/node_modules/pdf2json/lib/pdf.js:64:1), <anonymous>:24195:12)
    at Parser_getObj [as getObj] (eval at <anonymous> (/var/task/node_modules/pdf2json/lib/pdf.js:64:1), <anonymous>:24079:18)
    at XRef_fetch [as fetch] (eval at <anonymous> (/var/task/node_modules/pdf2json/lib/pdf.js:64:1), <anonymous>:5753:22)
    at XRef_fetchIfRef [as fetchIfRef] (eval at <anonymous> (/var/task/node_modules/pdf2json/lib/pdf.js:64:1), <anonymous>:5699:19)
    at Dict_get [as get] (eval at <anonymous> (/var/task/node_modules/pdf2json/lib/pdf.js:64:1), <anonymous>:4759:28)
    at Page_getPageProp [as getPageProp] (eval at <anonymous> (/var/task/node_modules/pdf2json/lib/pdf.js:64:1), <anonymous>:4213:28)
    at Page.get content [as content] (eval at <anonymous> (/var/task/node_modules/pdf2json/lib/pdf.js:64:1), <anonymous>:4227:19)

Ggabytes avatar Dec 07 '21 16:12 Ggabytes

Hi, thanks for the heads up.

It's possible that our underlying pdf parser (pdf2json) does not pass this error to the caller...

It may be worth trying forcing an upgrade to pdf2json v2 and see if it works on your side. See https://github.com/adrienjoly/npm-pdfreader/pull/97#issue-1073387472 for recommandations on how to do so. Please let us know how it went!

Adrien

adrienjoly avatar Dec 11 '21 12:12 adrienjoly

Also, would you accept to share that PDF file with us, is possible? It would help us create a test case to make sure that we support the handling of this kind of errors in the future.

adrienjoly avatar Dec 11 '21 12:12 adrienjoly