pdfrw icon indicating copy to clipboard operation
pdfrw copied to clipboard

Missing pages in opened PDF

Open AldoErco opened this issue 5 years ago • 2 comments

I have a particular PDF file that seem to break pdfrw PDF-DOM mapping. It seems unable to read all the pages. When I open it and print the number of pages with the usual:

from pdfrw import PdfReader, PdfWriter, PdfName
pdf = 'C:\\example.pdf'
input = PdfReader(pdf)
print('Num pages: {n1} and {n2}'.format(n1=input.numPages, n2=input[PdfName('Root')][PdfName('Pages')][PdfName('Count')]))

I get the output:

Num pages: 1 and 1

But the PDF is actually 5 pages long in AcroReader (and other readers too). Unfortunately I cannot attach the PDF since it contains sensitive information and, if I remove them with a tool, the edited file loads correctly. Any idea what the problem could be?

AldoErco avatar Sep 17 '19 14:09 AldoErco

I am having this same issue too. I have been investigating it all day today and I think it's due to the PDF having two startxref stanzas. If I can figure it out I'll submit a patch.

borwickatuw avatar Dec 24 '20 04:12 borwickatuw

FYI I found a workaround, in case it's helpful to anyone: if the PDF is first opened in Acrobat Pro and exported as an optimized PDF, pdfrw can then read that optimized PDF.

borwickatuw avatar Jan 21 '21 16:01 borwickatuw