pdfsizeopt icon indicating copy to clipboard operation
pdfsizeopt copied to clipboard

Converted PDF have broken pages in Acrobat Pro DC

Open galaxy001 opened this issue 5 years ago • 9 comments

I remove the broken xmp item, and get the result with ../pdfsizeopt.single nnGm.pdf nnGmo2.pdf 2>nnGmo2.log.

The "nnGmo2.pdf" can be viewed with macOS Preview app, but in Acrobat Pro DC 2018, some pages are broken. Such as: xx, 68, 69,70, 226-230 as labeled.

The pdf structure view also breaks after page 21.

image

nnGmo2.log nnGmo2.pdf

galaxy001 avatar Mar 19 '19 05:03 galaxy001

Thank you for reporting this in detail! This may indicate multiple bugs in pdfsizeopt.

Unfortunately I don't have a license for Acrobat Pro DC 2018, so I can't reproduce the problem. The indicated pages of nnGmo2.pdf work for me in Google Chrome and Evince.

Do you have more detailed error messages from Acrobat Pro DC 2018?

You may want to diagnose this further. First, try pdfsizeopt.single --use-pngout=no, to make the image processing faster. Then try pdfsizeopt.single --use-pngout=no --do-optimize-fonts=no. Does this fix all the problems you were encountering?

FYI The full list of useful flags to try are: pdfsizeopt.single --do-optimize-images=no --do-optimize-fonts=no --do-optimize-objs=no --do-optimize-streams=no --do-decompress-most-streams=yes --do-generate-xref-stream=no --do-generate-object-stream=no

pts avatar Mar 19 '19 12:03 pts

Even the last one pdfsizeopt.single --do-optimize-images=no --do-optimize-fonts=no --do-optimize-objs=no --do-optimize-streams=no --do-decompress-most-streams=yes --do-generate-xref-stream=no --do-generate-object-stream=no, does not work for Acrobat.

However, after convert to QDF with qpdf, it works.

Then, process the QDF with the last one above, it sucks again.


Would you use the free trial license to give a try ?
https://acrobat.adobe.com/us/en/free-trial-download.html

I don't know how to get a detailed error messages from Acrobat Pro DC 2018 yet.

galaxy001 avatar Mar 20 '19 04:03 galaxy001

I got a fix with 'qpdf':

qpdf --decode-level=none --normalize-content=y fo.pdf for.pdf
qpdf --decode-level=none for.pdf forr.pdf

$ ls -1s fo.pdf for.pdf forr.pdf
 9984 fo.pdf
18440 for.pdf
10248 forr.pdf

galaxy001 avatar Mar 20 '19 08:03 galaxy001

Can you please convert fo.pdf, for.pdf and forr.pdf with pdfsizeopt, upload all 3*2 files to this issue, and declare which work in Acrobat and which don't? It would be awesome to have such short example PDF files which don't work in Acrobat.

Having uploaded the files here you may also want to report the bug to Adobe, and wait for analysis and comments of the Adobe engineers. Currently (without a meaningful error message from Adobe Acrobat) it's not obvious whether pdfsizeopt or Adobe Acrobat has the bug.

I'm developing pdfsizeopt on Linux. In order to use Adobe Acrobat, I'd have to give my credit card details to Adobe (that's the smaller issue), and I'd have to either buy a Mac or install Windows to one of my existing computers (or into a VM). I'm willing to do this only if I'm compensated for the licenses and the work in advance.

pts avatar Mar 20 '19 10:03 pts

man ls

     -s      Display the number of file system blocks actually used by each file, in units of 512 bytes, where
             partial units are rounded up to the next integer value.  If the output is to a terminal, a total
             sum for all the file sizes is output on a line before the listing.  The environment variable
             BLOCKSIZE overrides the unit size of 512 bytes.

I tried to extract some pages. When there are a few pages, everything is right, thus I use this file set:

pdfsizeopt --do-keep-font-optionals=yes --do-regenerate-all-fonts=no --do-double-check-type1c-output=yes --do-ignore-generation-numbers=no --do-optimize-objs=no --use-multivalent=yes ex.pdf exo.pdf
qpdf --decode-level=none --normalize-content=y exo.pdf exor.pdf
qpdf --decode-level=none exor.pdf exorr.pdf

ex.pdf exo.pdf exor.pdf exorr.pdf

The Acrobat shows page 12 is missing for exo.pdf. Acrobat

galaxy001 avatar Mar 21 '19 07:03 galaxy001

I have the same issue. Thank you @galaxy001 for help qpdf --decode-level=none --normalize-content=y fixes the file and doesn't even increase file size.

@pts error message in adobe acrobat pro is Expected a dict object.

babinslava avatar Oct 18 '19 19:10 babinslava

Thank you for reporting this. I'd love to debug and fix this, but unfortunately I don't have a copy of Acrobat Pro DC, and the error message Expected a dict object. is already helpful, but not specific enough, it could take hours or days to debug by trial and error. Any contributions are welcome.

pts avatar Dec 12 '19 17:12 pts

Nevertheless, it's still worth investigating what difference qpdf --decode-level=none --normalize-content=y makes to the PDF file. Maybe pdfsizeopt itself could do it.

pts avatar Feb 23 '23 02:02 pts

Hi Péter,

I keep a copy of the last Adobe Reader for Linux (9.5.5) around, and just for your information, that program also doesn't show page 12 (and when I scroll down that far acroread spits out an error message "There was a problem reading this document (14).").

Also, my version of pdftk can't process this exo.pdf file.

To any Linux users reading this: you need quite a number of 32-bit libraries installed on your system to use this, and I have been told there are some security bugs with this version. So use it at your own risk.

Ndolam avatar Feb 23 '23 15:02 Ndolam