pdf-lib file size problem: 433mb generated from a 15mb document

What were you trying to do?

I'm trying to generate a new PDF document based on an existing one. See How can we reproduce the issue? section to download the original pdf document that is causing this issue.

How did you attempt to do it?

I'm using a code similar to this one:

const pdfBytes = fs.readFileSync("original.pdf");

// Load a PDFDocument from the existing PDF bytes
const inputPdf = await PDFDocument.load(pdfBytes as ArrayBuffer, {
  ignoreEncryption: true,
  parseSpeed: ParseSpeeds.Fastest,
  capNumbers: true
});

// create a new PDFDocument
this.output = await PDFDocument.create();

// get document pages
const pages = await inputPdf.getPages();

for (let pageIndex = 0; pageIndex < pages.length; pageIndex++) {
  const page = pages[pageIndex];
  
  // add new page
  newPage = this.output.addPage(PageSizes.A4);
        
  // embed and scale original page
  const embedPage = await this.output.embedPage(page);
  const scaledPageDims = embedPage.scale(0.75);
        
  newPage.drawPage(embedPage, {
    ...scaledPageDims,
    x: 10,
    y: 10
  });
}

// Serialize the PDFDocument to bytes (a Uint8Array)
const newPdfBytes = await this.output.save();

What actually happened?

The original document is 15 Mb in size and the generated document is 433 Mb.

What did you expect to happen?

I expected to get similar sizes from both the original and the generated document.

How can we reproduce the issue?

The code attached in section How did you attempt to do it? will reproduce this issue.

I think this is an issue specifically with this document, which is based on scanned images.

Version

1.17.1

What environment are you running pdf-lib in?

Node

Checklist

[X] My report includes a Short, Self Contained, Correct (Compilable) Example.
[X] I have attached all PDFs, images, and other files needed to run my SSCCE.

Additional Notes

No response

Nov 02 '22 17:11 juanludlf

Hi @Hopding Can you help me guess what's wrong here? Thank you

Nov 30 '22 11:11 juanludlf

Also wondering about this

Dec 05 '22 12:12 mrdavidrees

Yea, same issue here. Even the simple pages copying increases the result PDF size:

const copyDocument = async (buffer) => {
  console.log('initial size: ', buffer.byteLength); // 20296
  const newPdf = await PDFDocument.create();
  const initialPdf = await PDFDocument.load(buffer);
  const pages = initialPdf.getPages();
  for (let i = 0; i < pages.length; i++) {
    const [newPage] = await newPdf.copyPages(initialPdf, [i]);
    newPdf.addPage(newPage);
  }
  const bufferCopy = await newPdf.save();
  console.log('copy size: ', bufferCopy.byteLength); // 31691
};

Dec 09 '22 12:12 SergeiReutov

Yes, same issue encountered, 9MB file split with each file 10 page, increase to 60MiB for each sub file.

// split.pdf.js
const fs = require('fs');
const path = require('path');
const { PDFDocument } = require('pdf-lib');

const splitPDF = async (pdfFilePath, outputDirectory) => {
  const data = await fs.promises.readFile(pdfFilePath);
  const readPdf = await PDFDocument.load(data);
  const { length } = readPdf.getPages();

  for (let i = 0, n = length; i < n; i += 10) {
    const writePdf = await PDFDocument.create();
    for (let j = i; j < i + 10; j += 1) {
      const [page] = await writePdf.copyPages(readPdf, [j]);
      writePdf.addPage(page);   
    }
    const bytes = await writePdf.save();
    const outputPath = path.join(outputDirectory, `I100_${i + 1}.pdf`);
    await fs.promises.writeFile(outputPath, bytes);
     
    console.log(`Added ${outputPath}`);
  }
};

splitPDF('100.pdf', 'invoices').then(() =>
  console.log('File have been split!').catch(console.error)
);

Dec 10 '22 00:12 ns-sjli

Have you tried using copyPages instead of embedPage?

 // append to created pdf
  const [copyPage] = await this.output.copyPages(inputPdf, [0])
  this.output.addPage(copyPage)

Dec 19 '22 05:12 p-kuen

Hi @p-kuen I will give a try. However, the code samples provided by @SergeiReutov and @ns-sjli use the copyPage method and have the same problem 🤔

Dec 19 '22 20:12 juanludlf

Oh sorry, I should've watched more closely. I use copyPages myself and use the trick to put the whole merged pdf into ghostscript for compression, so I never had problems with this one. Not the cleanest solution but effective.

Dec 19 '22 23:12 p-kuen

Anybody got any solution on this issue?

Feb 22 '23 22:02 vpatil007

same issue, Anybody got any solution on this issue?

Feb 28 '24 06:02 weihuiling071

pdf-lib pdf-lib copied to clipboard

file size problem: 433mb generated from a 15mb document

What were you trying to do?

How did you attempt to do it?

What actually happened?

What did you expect to happen?

How can we reproduce the issue?

Version

What environment are you running pdf-lib in?

Checklist

Additional Notes

pdf-lib
pdf-lib copied to clipboard