pdfsizeopt zlib.error: Error -3 while decompressing: incorrect data check (in PermissiveZlibDecompress)

Another quick error reporting happening with this PDF: callrt.gpg.pdf (note: it is not mine, so I do not want it to be publicly distributed, so I encrypted with Keybase to your PGP key)

Not a big deal, again, but trying to solve these very few bugs. :-)

(...)
info: written 171577 bytes to PNG
info: will optimize image XObject 278; orig width=1074 height=839 colorspace=/DeviceRGB bpc=8 inv=False filter=/FlateDecode dp=0 size=301132 gs_device=png16m
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/bin/../Cellar/pdfsizeopt/HEAD-e4c50c8/bin/pdfsizeopt/__main__.py", line 1, in <module>
  File "/usr/local/bin/../Cellar/pdfsizeopt/HEAD-e4c50c8/bin/pdfsizeopt/m.py", line 6, in <module>
  File "/usr/local/bin/../Cellar/pdfsizeopt/HEAD-e4c50c8/bin/pdfsizeopt/pdfsizeopt/main.py", line 5503, in main
  File "/usr/local/bin/../Cellar/pdfsizeopt/HEAD-e4c50c8/bin/pdfsizeopt/pdfsizeopt/main.py", line 4216, in OptimizeImages
  File "/usr/local/bin/../Cellar/pdfsizeopt/HEAD-e4c50c8/bin/pdfsizeopt/pdfsizeopt/main.py", line 2251, in CompressToZipPng
  File "/usr/local/bin/../Cellar/pdfsizeopt/HEAD-e4c50c8/bin/pdfsizeopt/pdfsizeopt/main.py", line 462, in PermissiveZlibDecompress
zlib.error: Error -3 while decompressing: incorrect data check
zsh: exit 1     pdfsizeopt callrt.pdf

Apr 02 '18 10:04 sbibauw

I have a number of files like this... The workaround that I currently use is to uncompress them with pdftk and only then to process the file with pdfsizeopt.

BTW, this pdftk workaround is soon going to die from Linux distributions, as it depends on GCJ (which was even removed from relatively recent GCC versions)... :disappointed:

Apr 03 '18 02:04 rbrito

Please note that pdfsizeopt is not a repair tool or data rescue tool for broken PDFs. If the input PDF is broken, then pdfsizeopt may fail (or it may produce broken output).

Nevertheless, there may be an easy way to improve things here, so keeping this issue open. A possible improvement: If pdfsizeopt encounters an image object with broken data, it should print a warning, and keep that image object unoptimized (as is). The current behavior is raising a fatal exception (e.g. zlib.error above).

Apr 03 '18 13:04 pts

I totally agree! I believe the optimal behaviour would be, indeed, to ignore (and leave as it is) any broken/corrupt/erroneous image/data object inside the PDF, and still go on with the compression of the rest, but even a simple error message such as when a PDF is encrypted (something like "Not possible to process this file because it contains a broken object") would be great. It is just that the Python error output, with Traceback etc. leaves the impression to illiterate users (as me) that it was an unforeseen pdfsizeopt failure.

Apr 03 '18 14:04 sbibauw

@rbrito: Could please file issues for the similar files of yours, where running pdftk before pdfsizeopt fixes it? Maybe those indicate real (so far undiscovered) bugs in pdfsizeopt.

Feb 24 '23 00:02 pts