OCRmyPDF
OCRmyPDF copied to clipboard
[Bug]: Large file size increases due to PDF/A font substitution
Describe the bug
In my case the creation of a PdfA increased the size by a multiple of 500 !!!
-
IMO I identified the culprit: gs can not handle mixed portrait and landscape well. After separating portrait and landscape files in 2 separate files, ocrmypdf performed extremely well and reduced the file size of each file. THIS WAS TRUE FOR ONE SET OF TYPICAL FILES - BUT NOT FOR OTHERS
-
A solution could be to split each file into single page files, run ocrmypdf (and hence gs) on each and put these together again? - DOES NOT SOLVE THE PROBLEM
Steps to reproduce
1. Run ocrmypdf -v --skip-text input.pdf output.pdf
BTW I tried many other parameters - output all about the same size
gs took minutes to create the multi MB files.
Files
here the json representation using qpdf --json <> Monatsbericht zum 30.06.2023-json.pdf Monatsbericht zum 30.06.2023-ocr-json.pdf
the log file ocrmypdf.log encrypted original file test.zip
shows the size after ocrmypdf
How did you download and install the software?
Homebrew
OCRmyPDF version
16.4.2
Relevant log output
No response