pdf-lib
pdf-lib copied to clipboard
file size problem: 433mb generated from a 15mb document
What were you trying to do?
I'm trying to generate a new PDF document based on an existing one.
See How can we reproduce the issue?
section to download the original pdf document that is causing this issue.
How did you attempt to do it?
I'm using a code similar to this one:
const pdfBytes = fs.readFileSync("original.pdf");
// Load a PDFDocument from the existing PDF bytes
const inputPdf = await PDFDocument.load(pdfBytes as ArrayBuffer, {
ignoreEncryption: true,
parseSpeed: ParseSpeeds.Fastest,
capNumbers: true
});
// create a new PDFDocument
this.output = await PDFDocument.create();
// get document pages
const pages = await inputPdf.getPages();
for (let pageIndex = 0; pageIndex < pages.length; pageIndex++) {
const page = pages[pageIndex];
// add new page
newPage = this.output.addPage(PageSizes.A4);
// embed and scale original page
const embedPage = await this.output.embedPage(page);
const scaledPageDims = embedPage.scale(0.75);
newPage.drawPage(embedPage, {
...scaledPageDims,
x: 10,
y: 10
});
}
// Serialize the PDFDocument to bytes (a Uint8Array)
const newPdfBytes = await this.output.save();
What actually happened?
The original document is 15 Mb in size and the generated document is 433 Mb.
What did you expect to happen?
I expected to get similar sizes from both the original and the generated document.
How can we reproduce the issue?
The code attached in section How did you attempt to do it?
will reproduce this issue.
I think this is an issue specifically with this document, which is based on scanned images.
Version
1.17.1
What environment are you running pdf-lib in?
Node
Checklist
- [X] My report includes a Short, Self Contained, Correct (Compilable) Example.
- [X] I have attached all PDFs, images, and other files needed to run my SSCCE.
Additional Notes
No response
Hi @Hopding Can you help me guess what's wrong here? Thank you
Also wondering about this
Yea, same issue here. Even the simple pages copying increases the result PDF size:
const copyDocument = async (buffer) => {
console.log('initial size: ', buffer.byteLength); // 20296
const newPdf = await PDFDocument.create();
const initialPdf = await PDFDocument.load(buffer);
const pages = initialPdf.getPages();
for (let i = 0; i < pages.length; i++) {
const [newPage] = await newPdf.copyPages(initialPdf, [i]);
newPdf.addPage(newPage);
}
const bufferCopy = await newPdf.save();
console.log('copy size: ', bufferCopy.byteLength); // 31691
};
Yes, same issue encountered, 9MB file split with each file 10 page, increase to 60MiB for each sub file.
// split.pdf.js
const fs = require('fs');
const path = require('path');
const { PDFDocument } = require('pdf-lib');
const splitPDF = async (pdfFilePath, outputDirectory) => {
const data = await fs.promises.readFile(pdfFilePath);
const readPdf = await PDFDocument.load(data);
const { length } = readPdf.getPages();
for (let i = 0, n = length; i < n; i += 10) {
const writePdf = await PDFDocument.create();
for (let j = i; j < i + 10; j += 1) {
const [page] = await writePdf.copyPages(readPdf, [j]);
writePdf.addPage(page);
}
const bytes = await writePdf.save();
const outputPath = path.join(outputDirectory, `I100_${i + 1}.pdf`);
await fs.promises.writeFile(outputPath, bytes);
console.log(`Added ${outputPath}`);
}
};
splitPDF('100.pdf', 'invoices').then(() =>
console.log('File have been split!').catch(console.error)
);
Have you tried using copyPages
instead of embedPage
?
// append to created pdf
const [copyPage] = await this.output.copyPages(inputPdf, [0])
this.output.addPage(copyPage)
Hi @p-kuen
I will give a try. However, the code samples provided by @SergeiReutov and @ns-sjli use the copyPage
method and have the same problem 🤔
Oh sorry, I should've watched more closely. I use copyPages myself and use the trick to put the whole merged pdf into ghostscript for compression, so I never had problems with this one. Not the cleanest solution but effective.
Anybody got any solution on this issue?
same issue, Anybody got any solution on this issue?