pdfrw
pdfrw copied to clipboard
Did not find PDF object (1, 0)
with open( 'raw.pdf', 'wb') as pdf_file:
pdf_file.write(data)
writer.addpages(PdfReader('new.pdf').pages)
writer.write("_signed_manifest.pdf")
Error i get:
[WARNING] tokens.py:221 Indirect object 5 0 obj found at incorrect offset 113236 (expected offset 113178) (line=810, col=1, token='4')
[WARNING] tokens.py:221 Indirect object 4 0 obj found at incorrect offset 113178 (expected offset 112991) (line=795, col=1, token='3')
[WARNING] tokens.py:221 Indirect object 3 0 obj found at incorrect offset 112991 (expected offset 112894) (line=788, col=1, token='2')
[WARNING] tokens.py:221 Indirect object 2 0 obj found at incorrect offset 112894 (expected offset 9) (line=2, col=1, token='1')
[WARNING] tokens.py:221 stream keyword terminated by \r without \n (line=791, col=1, token='stream')
[WARNING] tokens.py:221 Did not find PDF object (1, 0) (line=794, col=1, token='endobj')
Probably a broken file.
a workaround on windows is to open the 'broken' pdf and print it "as pdf" per windows. the new 're-printed' pdf works correctly.
I have the same pattern of error messages, and the fork from https://github.com/sarnold/pdfrw doesn't resolve it
[WARNING] tokens.py:221 Indirect object 5 0 obj found at incorrect offset 430213 (expected offset 430155) (line=2918, col=1, token='4')
[WARNING] tokens.py:221 Indirect object 6 0 obj found at incorrect offset 430264 (expected offset 430213) (line=2925, col=1, token='5')
[WARNING] tokens.py:221 Indirect object 4 0 obj found at incorrect offset 430155 (expected offset 429968) (line=2903, col=1, token='3')
[WARNING] tokens.py:221 Indirect object 3 0 obj found at incorrect offset 429968 (expected offset 429871) (line=2896, col=1, token='2')
[WARNING] tokens.py:221 Indirect object 2 0 obj found at incorrect offset 429871 (expected offset 9) (line=2, col=1, token='1')
[WARNING] tokens.py:221 stream keyword terminated by \r without \n (line=2899, col=1, token='stream')
[WARNING] tokens.py:221 Did not find PDF object (1, 0) (line=2902, col=1, token='endobj')
anyway, maybe it's a "strange" pdf generated, but it's not broken as windows can open it... a better handling shall be possible
if I open the "bad original" per notepad++ on Windows and the "reprinted per windows", the beginning is interesting:
old bad:
%PDF-1.3
1 0 obj
<</Type /XObject /Subtype /Image /Name /Im1 /Width 1654 /Height 2338 /Length 429678/ColorSpace /DeviceRGB /BitsPerComponent 8 /Filter [ /DCTDecode ] >> stream
ÿØÿà
new good:
%PDF-1.7
4 0 obj
<<
/BitsPerComponent 8
/ColorSpace /DeviceRGB
/Filter /DCTDecode
/Height 52
/Length 6405
/Subtype /Image
/Type /XObject
/Width 1654
>>
stream
ÿØÿà
The "bad" version seems to use <CR>
as the carriage return on main partes, while "normal" pdf use apparently <LF>
==> a Mac thing ?
old bad;
new reprinted: