PDF file size too large - option to bypass png image recompression?
I have a problem with dompdf producing larger file sizes than expected. My document is actually very simple because it consists of a single image inserted with an <img> tag and a short text overlaid on it with an absolutely positioned <div> (it's a voucher with personalized code). The image is mostly line-art so png is a better choice than jpeg both for file size and quality. I took the time to produce a very well compressed png with special software and I managed to get it down to 55 KB from the original 144 KB. However, the resulting pdf containing only this image is 143 KB. As an experiment I used the unoptimized 144 KB original png as the source and the resulting pdf was 147 KB, which was almost the same. Clearly, dompdf recompresses the image, which causes the file size to go up.
I've already learned it here that for certain features dompdf reparses png images. I'm wondering - is it possible to turn off this reparsing an as option? I don't know what features would be missed in doing so but in my case I don't think I need any special features. It's a non-transparent image on white background and that's it. I just want my png to be inserted as is into the pdf document, it this possible?
The extra processing of PNG images is something I'd have to look into more. The initial logic was developed with the PDF 1.3 spec in mind so there may be some improvements to be had. A quick look at the code and one thing you might check is whether or not you're saving with transparency. If the image is saved with a color type supporting transparency (even if it isn't used) then Cpdf will create a mask for the image.
The png images I use are colour type 3 and bit depth 8, which apparently is treated as image with alpha even though there is no transparency. I tried bypassing the masking by hard coding $is_alpha = false in Cpdf and here are my results with a png 55 KB in size:
- Detected as an alpha image (default) - the pdf is 143 KB.
- Forced
$is_alpha = false- the pdf is 138 KB.
So there is some size reduction to be gained but it is not much. However, there is a noticeable increase in speed with option 2. Somewhat better but not quite yet.
However, I was able to achieve very good results when I combined option 2 with using original png file: method addImagePng does some magic to the image with GD (I haven't figured yet what and why) and then writes the image to $data variable by creating it from scratch with imagepng() function. Just for testing I added $data = file_get_contents($file); to load the original image there and the resulting pdf is 57 KB and it was created very fast. It looks identical to the larger one created without any modifications.
I don't know what are the downsides of this modification but for such simple cases it looks like a win-win situation. It would be great if dompdf had an option to bypass the masking and to use the original image without modifications.
Some thoughts:
- Masking is (was?) necessary where alpha transparency is used. Here's relevant comment from the code:
png files typically contain an alpha channel. pdf file format or class.pdf does not support alpha blending. on alpha blended images, more transparent areas have a color near black. This appears in the result on not storing the alpha channel. Correct would be the box background image or its parent when transparent. But this would make the image dependent on the background. Therefore create an image with white background and copy in A more natural background than black is white. Therefore create an empty image with white background and merge the image in with alpha blending.
- Cpdf will generate a mask for color type 4 or 6, or color type 3 with a bit depth other than 4. I'm not sure why other bit depths for type 3 automatically assume alpha transparency. We can probably improve that logic per the spec.
- Speed is definitely an issue when using GD (as opposed to IMagick). Primarily because the GD-based logic parses the original image pixel by pixel to generate the mask.
- I wonder if the size difference, even with masking disabled, is a matter of Cpdf using a true-color image regardless of the bit depth of the original image.
There's a lot to review here, but I imagine we can improve PNG handling. I think, as you noted, we should at least look at the possibility of just embedding the original image if there is no transparency at all.
- I wonder if the size difference, even with masking disabled, is a matter of Cpdf using a true-color image regardless of the bit depth of the original image
Yes, Cpdf always creates 24-bit image regardless of the original one and it appears 24-bit png is always larger than 8-bit png with identical pixel data. However, there's also another factor, which is the efficiency of the GD or ImageMagick compression algorithm, which will always be worse than a tool designed for best possible compression. For example, I use Color Quantizer, which at higher settings can take as much as half a minute to compress one png - an online tool like a pdf generator will never yield equally small png files because it must be fast.
There's a lot to review here, but I imagine we can improve PNG handling. I think, as you noted, we should at least look at the possibility of just embedding the original image if there is no transparency at all.
Yes, certainly there are a few things to do to improve PNG handling. I think a good first step, and a simple one, would be to enable embedding original image file and perhaps skip masking.