pdf-lib
pdf-lib copied to clipboard
TypeError: _this.catalog.Pages(...).traverse is not a function
What were you trying to do?
I am trying to load a 90 page PDF into the lib
How did you attempt to do it?
Here is a simple reproduction of the issue
const { PDFDocument } = require("pdf-lib");
const fs = require("fs");
const fileWithError = fs.readFileSync("./policy-doc-test.pdf");
async function main() {
const parentPDFDoc = await PDFDocument.load(fileWithError);
console.log(parentPDFDoc.getPageCount());
}
main();
What actually happened?
I am getting the TypeError: _this.catalog.Pages(...).traverse is not a function
error anytime I call any APIs that require traversing the pages. This includes getPageCount
, save
, etc.
What did you expect to happen?
I expected these functions to work as expected.
How can we reproduce the issue?
Run the above code snippet using node
Version
1.17.1
What environment are you running pdf-lib in?
Node
Checklist
- [X] My report includes a Short, Self Contained, Correct (Compilable) Example.
- [X] I have attached all PDFs, images, and other files needed to run my SSCCE.
Additional Notes
Above is the code snippet for reproducing the issue. The document is a somewhat sensitive PDF so i'd prefer to not attach it here publicly. I can attach the PDF via a DM or email.
Some more context:
This is a 90 page document (3.8MB). Opening it in Acrobat causes an error in acrobat. not sure if its related but I suspect it could be.
Here's the fun part... re-exporting this file and opening it with pdf-lib works as expected so Acrobat is doing something that fixes the issue, just not sure what and unfortunately re-exporting through acrobat isn't an option given the task.
data:image/s3,"s3://crabby-images/e99d4/e99d4d4f4c557c763d48d9637a51658e3fbdb162" alt="Screen Shot 2022-01-04 at 3 47 39 PM"
Here to see if anyone knows what may be going on and how to potentially fix this issue. Thanks!
Facing the same issue. My PDF is around 60 to 70 pages
Same issue here, is there any time frame on when this will be looked into/fixed
We are experiencing the same issue, any news on this? glad to help any way I can
Same issue, any chance to fix this soon?
no fix
We are also experiencing this issue with a specific PDF.
I also stumbled over this.
In my case, the reason was that the /Pages
dict doesn't have /Type
set to /Pages
. That caused the PDF parser to instantiate the object as a plain PDFDict
instead of a PDFPageTree
.
I was successful with the following workaround:
const pdfDoc = await PDFDocument.load(bytes)
// Find reference to the page tree
const pagesRef = pdfDoc.catalog.get(PDFName.of('Pages'))
// Get the page tree. This is a PDFDict.
const oldPageTree = pdfDoc.context.indirectObjects.get(pagesRef)
// Create a PDFPageTree with the same content.
const newPageTree = new PDFPageTree(oldPageTree.dict, oldPageTree.context)
// Set the correct `Type`.
newPageTree.dict.set(PDFName.of('Type'), PDFName.of('Pages'));
// Replace the PDFDict with the PDFPageTree in the document.
pdfDoc.context.indirectObjects.set(pagesRef, newPageTree)
// Save fixed document
...
In my case the PDFDocument.catalog property was initialised with a PDFDict instead of a PDFCatalog. So here is my workaround for the bug:
const doc = await PDFDocument.load(bytes, { ignoreEncryption: true });
if (!(doc.catalog instanceof PDFCatalog) && ((doc.catalog as any) instanceof PDFDict)) {
(doc as any).catalog = PDFCatalog.fromMapWithContext(doc.catalog, doc.context);
}
For me it wasn't working due to Catalog pointing to the wrong object. I did this to manually point Catalog to a PDFPageTree
let pdfPageTree;
for (const entry of pdfDoc.context.indirectObjects.entries()) {
const [ref, obj] = entry;
if (obj instanceof pdfLib.PDFPageTree) {
pdfPageTree = obj;
break;
}
}
doc.catalog = pdfLib.PDFCatalog.withContextAndPages(pdfDoc.context, pdfPageTree);