OCRmyPDF icon indicating copy to clipboard operation
OCRmyPDF copied to clipboard

[Bug]: Large file size increases due to PDF/A font substitution

Open ferdiga opened this issue 6 months ago • 9 comments

Describe the bug

In my case the creation of a PdfA increased the size by a multiple of 500 !!!

  • IMO I identified the culprit: gs can not handle mixed portrait and landscape well. After separating portrait and landscape files in 2 separate files, ocrmypdf performed extremely well and reduced the file size of each file. THIS WAS TRUE FOR ONE SET OF TYPICAL FILES - BUT NOT FOR OTHERS

  • A solution could be to split each file into single page files, run ocrmypdf (and hence gs) on each and put these together again? - DOES NOT SOLVE THE PROBLEM

Steps to reproduce

1. Run ocrmypdf -v --skip-text input.pdf output.pdf
BTW I tried many other parameters - output all about the same size
gs took minutes to create the multi MB files.

Files

here the json representation using qpdf --json <> Monatsbericht zum 30.06.2023-json.pdf Monatsbericht zum 30.06.2023-ocr-json.pdf

the log file ocrmypdf.log encrypted original file test.zip

shows the size after ocrmypdf 20240803 100232 ocr_test

How did you download and install the software?

Homebrew

OCRmyPDF version

16.4.2

Relevant log output

No response

ferdiga avatar Aug 03 '24 08:08 ferdiga