pdfrw
pdfrw copied to clipboard
Missing pages in opened PDF
I have a particular PDF file that seem to break pdfrw PDF-DOM mapping. It seems unable to read all the pages. When I open it and print the number of pages with the usual:
from pdfrw import PdfReader, PdfWriter, PdfName
pdf = 'C:\\example.pdf'
input = PdfReader(pdf)
print('Num pages: {n1} and {n2}'.format(n1=input.numPages, n2=input[PdfName('Root')][PdfName('Pages')][PdfName('Count')]))
I get the output:
Num pages: 1 and 1
But the PDF is actually 5 pages long in AcroReader (and other readers too). Unfortunately I cannot attach the PDF since it contains sensitive information and, if I remove them with a tool, the edited file loads correctly. Any idea what the problem could be?
I am having this same issue too. I have been investigating it all day today and I think it's due to the PDF having two startxref
stanzas. If I can figure it out I'll submit a patch.
FYI I found a workaround, in case it's helpful to anyone: if the PDF is first opened in Acrobat Pro and exported as an optimized PDF, pdfrw can then read that optimized PDF.