pdfsizeopt
pdfsizeopt copied to clipboard
"Found empty name token" when trying to optimize a PDF file
Hi, @pts.
I have started to accumulate some new PDF files that show problems when they are processed with pdfsizeopt and one of them gives the following error (with the latest checkout from the master branch):
$ ~/Downloads/pdfsizeopt/pdfsizeopt --use-image-optimizer=jbig2 --use-multivalent=no --do-optimize-images=no relativity-demystified.pdf
info: This is pdfsizeopt rUNKNOWN size=390796.
info: prepending to PATH: /home/rbrito/Downloads/pdfsizeopt
info: loading PDF from: relativity-demystified.pdf
info: loaded PDF of 1950732 bytes
info: using Ghostscript TMPDIR=/home/rbrito/tmp TEMP=/home/rbrito/tmp gs: GPL Ghostscript 9.22 (2017-10-04)
info: decompressing 8042 bytes with Ghostscript /Filter/FlateDecode/DecodeParms <</Predictor 12/Columns 5>>
info: found 4590 obj offsets and 19 obj streams in xref stream
Traceback (most recent call last):
File "/home/rbrito/Downloads/pdfsizeopt/pdfsizeopt", line 41, in <module>
sys.exit(main.main(sys.argv, script_dir=script_dir))
File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 9431, in main
).Load(file_name)
File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 4499, in Load
do_ignore_generation_numbers=self.do_ignore_generation_numbers)
File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 4905, in ParseUsingXref
xref_ofs, xref_obj_num, xref_generation)
File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 4852, in ParseUsingXrefStream
'%d 0 obj\n%s\nendobj\n' % (obj_num, compressed_obj_headbufs[i]))
File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 1358, in __init__
sys.exc_info()[2])
pdfsizeopt.main.PdfTokenParseError: In obj data between ofs 0 and 346: Found empty name token.
rbrito@abreu:/tmp$
Since the file in question is copyrighted, I will send you a link privately to it.
Thanks in advance,
Rogério Brito.
Object 126 in the input file contains /ColorSpace<<//DeviceGray/CS1
. This is invalid PDF, because there is a double slash in front of DeviceGray
. Section 3.2.4 of pdf_reference_1-7.pdf doesn't allow multiple slashes at the start of names. pdfsizeopt is right to report a syntax error here.
Do you have any better recommendation? Should pdfsizeopt be lenient and ignore one of the slashes?
What do other PDF viewers do?
Hi, @pts... Good question.
The other pdf viewers seem to be lenient and allow something like this to be viewed (but, to be honest, I have not read their code to see what measures they take)...
I think that I may have more than one file with this error, but I will have to double check...
OTOH, I understand that this is a corner case for dealing with invalid PDFs (I didn't know that the file was broken before your warning, though).