pdfsizeopt icon indicating copy to clipboard operation
pdfsizeopt copied to clipboard

Add optimization of /CalRGB and /CalGray images

Open rbrito opened this issue 5 years ago • 3 comments

Hi, @pts.

Perhaps you consider this to be the same issue as with issue #102, perhaps not.

I had a file that contained only bilevel images that were actually deflated and with a prefix of (before the actual stream):

<<
/ColorSpace [/CalRGB 
<<
/Gamma [2.2 2.2 2.2]
/WhitePoint [0.95043 1 1.09]
/Matrix [0.41239 0.21264 0.01933 0.35758 0.71517 0.11919 0.18045 0.07218 0.9504]
>>]
/Height 3093
/Subtype /Image
/Filter /FlateDecode
/DecodeParms 
<<
/Columns 2216
/Colors 3
/Predictor 15
/BitsPerComponent 8
>>
/Width 2216
/BitsPerComponent 8
/Length 341433
>>

When running pdfsizeopt, it didn't try to touch those images. I'm attaching a page from this document here.

I'm also attaching a page that I produced by a bad method of extracting the image with pdfimages, then wrapped with img2pdf and then compressed with pdfsizeopt and the difference in size is amazing (from 342kB to 42kB or, in other words, only approximately 12% of the size!).

The files are visually identical (as far as diffpdf is concerned), but this method has the huge drawback of throwing away any scanned text and it only works if all the pages are scans.

Thanks,

Rogério Brito.

p-010.pdf p-010.pso.pdf b.pdf b.pso.pdf

rbrito avatar Dec 10 '18 22:12 rbrito

Thank you for reporting this!

The /CalRGB colorspace is not supported by pdfsizeopt. This code explicitly skips unsupported colorspaces:

      if not re.match(r'(?:/Device(?:RGB|Gray)\Z|\[[\0\t\n\r\f ]*'
                      r'/Indexed[\0\t\n\r\f ]*'
                      r'/Device(?:RGB|Gray)[\0\t\n\r\f (<\[/])', colorspace):
        continue

Adding support would be possible, but not trivial. Since there is no simple conversion between /CalRGB and /DeviceGray (etc.), all image optimizers which change the colorspace have to be disabled for such images.

An alternative to the above is converting from /CalRGB to /DeviceRGB before optimizing the image. Preferably we'd need a printing expert's opinion about the print quality degradation when converting from /CalRGB to /DeviceRGB. (The fact that diffpdf doesn't show any diffs can be misleading, maybe the color differences are more subtle, not representable in 8 bits.)

pts avatar Dec 11 '18 10:12 pts

I can confirm that just changing the /ColorSpace value to /DeviceRGB in p-010.pdf makes the output of pdfsizeopt much smaller (info: generated 42726 bytes (12%)). However, this change is not safe, because it can also affect the visual appearance of the image, and by design pdfsizeopt doesn't change the visual appearance.

Nevertheless we could enable such unsafe changes with a command-line flag.

pts avatar Dec 11 '18 10:12 pts

Good news: it is possible to add support for these color spaces to pdfsizeopt with keeping existing image optimizers (sam2p, jbig2, pngout etc.) in a safe way, without introducing visible changes:

  • [/CalGray ...]
  • [/CalRGB ...]
  • [/Indexed [/CalGray ...] ...]
  • [/Indexed [/CalRGB ...] ...]

The trick is to pretend that these are /DeviceGray or /DeviceRGB (or the /Indexed variants of those) while the image optimizers are running, and keep the original (*Cal*) /ColorSpace value in the PDF object along with the optimized image data. The only problem is the conversion to [/CalGray ...] from [/CalRGB ...] (when color components within a pixel have the same values), because there is no color forumula mapping. The workaround this is emitting [/Indexed [/CalRGB ...] ...] instead of [/CalGray ...].

Keeping this issue open to track to implementation of this feature.

pts avatar Dec 14 '18 15:12 pts