pdfsizeopt
pdfsizeopt copied to clipboard
Fails with out-of-memory for a very-very large pdf file
I have a pdf file that is 1.3 Gb in size ( it's a master thesis, that's why I am not annexing it here ) Okular can handle it pretty well but crashes Adobe While trying to use pdfsizeopt it crashes too with a memory error
info: This is pdfsizeopt ZIP rUNKNOWN size=69734.
info: prepending to PATH: /home/ludee/Programs/pdfsizeopt/pdfsizeopt_libexec
info: loading PDF from: /home/ludee/Desktop/Dissertação_Ana_Antunes_201405897.pdf
info: loaded PDF of 1322590721 bytes
info: separated to 2269032 objs + xref + trailer
Traceback (most recent call last):
File "/proc/self/exe/runpy.py", line 162, in _run_module_as_main
File "/proc/self/exe/runpy.py", line 72, in _run_code
File "./pdfsizeopt.single/__main__.py", line 1, in <module>
File "./pdfsizeopt.single/m.py", line 6, in <module>
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 5622, in main
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 2664, in Load
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 689, in __init__
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 942, in Get
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 1217, in ParseDict
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 1148, in ParseSimpleValue
MemoryError
@LudeeD say> I have a pdf file that is 1.3 Gb in size
More information please:
pdfinfo /home/ludee/Desktop/Dissertação_Ana_Antunes_201405897.pdf
And see https://github.com/pts/pdfsizeopt/issues/119
More info
Title:
Subject:
Keywords:
Author:
Creator: LaTeX with hyperref
Producer: pdfTeX-1.40.19
CreationDate: Sun Jun 30 21:11:45 2019 WEST
ModDate: Sun Jun 30 21:11:45 2019 WEST
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 308
Encrypted: no
Page size: 595.276 x 841.89 pts (A4)
Page rot: 0
File size: 1322590721 bytes
Optimized: no
PDF version: 1.5
Following instructions on #119 cpdf also failed with a
Initial file size is 1322590721 bytes
Beginning squeeze: 2269033 objects
Fatal error: out of memory.
@LudeeD say> Pages: 308, File size: 1322590721 bytes
1322590721/308 = 4294128 bytes/page. Hmm! Is big!
You can change /FlateDecode (~ png) to /DCTDecode (~ jpeg), use ghostscript:
ps2pdf /home/ludee/Desktop/Dissertação_Ana_Antunes_201405897.pdf /home/ludee/Desktop/Dissertação_Ana_Antunes_201405897.gs.pdf
After running for 3 hours I gave up on this. Rebuilt the PDF with compressed versions of the images and now its in a more reasonable size.
feel free to close this issue if handling > 1Gb files is not really a priority
Thanks for the help
Can you share this file? It sure sounds interesting and I would like to have a look at it.
Thanks,
Rogério Brito.
Em seg, 1 de jul de 2019 12:54, Luís Silva [email protected] escreveu:
I have a pdf file that is 1.3 Gb in size ( it's a master thesis, that's why I am not annexing it here ) Okular can handle it pretty well but crashes Adobe While trying to use pdfsizeopt it crashes too with a memory error
info: This is pdfsizeopt ZIP rUNKNOWN size=69734.
info: prepending to PATH: /home/ludee/Programs/pdfsizeopt/pdfsizeopt_libexec
info: loading PDF from: /home/ludee/Desktop/Dissertação_Ana_Antunes_201405897.pdf
info: loaded PDF of 1322590721 bytes
info: separated to 2269032 objs + xref + trailer
Traceback (most recent call last):
File "/proc/self/exe/runpy.py", line 162, in _run_module_as_main
File "/proc/self/exe/runpy.py", line 72, in _run_code
File "./pdfsizeopt.single/main.py", line 1, in
File "./pdfsizeopt.single/m.py", line 6, in
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 5622, in main
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 2664, in Load
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 689, in init
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 942, in Get
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 1217, in ParseDict
File "./pdfsizeopt.single/pdfsizeopt/main.py", line 1148, in ParseSimpleValue
MemoryError
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pts/pdfsizeopt/issues/125?email_source=notifications&email_token=AABTZMIXYH56MRBFKPB2MGLP5ISEFA5CNFSM4H4TY5WKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G4VUHNA, or mute the thread https://github.com/notifications/unsubscribe-auth/AABTZMJBALHMD5X7I67PTXLP5ISEFANCNFSM4H4TY5WA .
@rbrito say> It sure sounds interesting and I would like to have a look at it.
Use pdftk to process the file in parts.
pdfsizeopt indeed uses a lot of memory for large PDF files, because it keeps the parsed version of the entire PDF file in memory. It also keeps multiple versions of compressed image data in memory for the current image being optimized.
Throwing more memory at it should make it work. Unfortunately there is no easy estimate for the total required memory for a given input file.
In the meantime, splitting the PDF file on some page boundary (with pdftk or qpdf), running pdfsizeopt on the split PDF files individually, and joining the results may work for some PDFs.
I'm keeping this issue open as a reminder to add memory optimizations.