pdfjs Cannot read property 'compressed' of undefined

Cannot read property 'compressed' of undefined

Open mojoaxel opened this issue 2 years ago • 5 comments

We ran into a pdfjs (v2.4.5) problem: https://github.com/nbesli/pdf-merger-js/issues/42

The following code-snipped...

const doc = new pdf.Document()
const src = await fs.readFile(path.join(FIXTURES_DIR, 'issue-42.pdf'))
const ext = new pdf.ExternalDocument(src)
doc.addPagesOf(ext)
const fileBuffer = await doc.asBuffer()
await fs.writeFile(path.join(TMP_DIR, 'Testfile_issue-42.pdf'), fileBuffer)

...results in this error:

TypeError: Cannot read property 'compressed' of undefined

      at parseObject (node_modules/pdfjs/lib/object/reference.js:81:15)
      at PDFReference.get [as object] (node_modules/pdfjs/lib/object/reference.js:15:17)
      at Function.addObjectsRecursive (node_modules/pdfjs/lib/parser/parser.js:68:35)
      at Function.addObjectsRecursive (node_modules/pdfjs/lib/parser/parser.js:84:18)
      at Function.addObjectsRecursive (node_modules/pdfjs/lib/parser/parser.js:75:16)
      at ExternalDocument.write (node_modules/pdfjs/lib/external.js:62:14)

Please find the problematic PDF file attached: issue-42.pdf

Aug 01 '21 15:08 mojoaxel

Thanks for the report! I looked into it and the cause of the issue seems to be that pdfjs does not support hybrid-reference files. More specifically, the support for the XRefStm property of the trailer is not yet implemented. While it successfully falls back to the normale xref table (instead of the xref stream), the normal xref table is missing the object with the ID 46, which is thus unknown and causes the error you've posted.

Possible solutions:

Implement support for XRefStm
Silently ignore missing objects (I am not sure if I'd like this solution though)

I don't have the time right now to implement it, but I'll keep it in the back of my mind.

Aug 18 '21 08:08 rkusa

Any suggested temporary fixes that we might be able to use to circumvent this error while the issue is waiting to be resolved?

Oct 05 '21 23:10 hobgoblina

Can you check if the PDF is hybrid reference or not? I'm currently having this problem, and I want to prevent the pdf merge if there's a way to check for that.

Oct 15 '21 21:10 cah-andy-kim

Hi everyone! I have a small solution, but it will not suit everyone. And we need to use node-pdftk

import pdf from 'pdfjs';
import fs from 'fs';
import pdftk from 'node-pdftk';

const src = await pdftk.input('issue-42.pdf').output(); //
const doc = new pdf.Document();
const ext = new pdf.ExternalDocument(src);
doc.addPagesOf(ext);
const fileBuffer = await doc.asBuffer();
fs.writeFileSync('Testfile_issue-42.pdf', fileBuffer);

Looks like node-pdftk extracts xref table from a xref stream (It means that a file will weigh more). So, pdfjs can work with it. Testfile_issue-42.pdf looks the same after launching the code above. But links now it's just a text.

Nov 23 '21 14:11 shu512

Running into this now.. Any actual fixes?

Jul 19 '22 18:07 sjd2021

pdfjs pdfjs copied to clipboard

Cannot read property 'compressed' of undefined

pdfjs
pdfjs copied to clipboard