oxipng Better use of local hardware?

I almost hate to bring this up (as it seems far-fetched & virtually fantasy @ this point), but is there any way to use locally available hardware resources (if any) to improve oxipng's performance in any aspect? After an hour of Googling, the best compilation of, e.g., GPU techniques I found is the C repo @ https://github.com/BuzzHari/hp-project. Is there anything useful here or anywhere to apply?

Sep 29 '23 09:09 TPS

There are indeed close and interesting connections between models that run efficiently on GPUs and lossless data compression. But as far as I know, there are several major problems with "productionizing" these ideas in software like OxiPNG that have prevented workable solutions from emerging:

Not every computer has a fancy NVIDIA GPU, and not every fancy NVIDIA GPU is the same. OpenCL and Vulkan/OpenGL compute shaders can be an option for cross-vendor and cross-platform GPGPU tasks, but I feel that such alternatives have a low mindshare among the researchers willing to develop this stuff compared to the proprietary CUDA API. Also, integrating GPU code into OxiPNG comes at a (very!) significant cost, and it's not clear that a significant fraction of users have GPUs powerful enough that such a GPGPU approach would be beneficial.
PNG uses established compression formats that OxiPNG cannot change. The compression layer on PNG files follows the Zlib format, which encapsulates DEFLATE streams, and so far no public research has achieved speedups higher than ~x1.2 for DEFLATE compression on GPUs, even on thousands of dollars worth of NVIDIA A100 GPUs [1][2], in part due to some inherently serial design choices of the DEFLATE format. So at least for compression, which is a task that typical OxiPNG runs spend quite a bit of time on, moving it to the GPU won't help much.
It seems unlikely that other OxiPNG optimization tasks could benefit much from the GPGPU paradigm. I haven't found any literature on GPU-accelerated palette sorting algorithms, for example, but I would suspect that such tasks are so little data-intensive, contribute so little to execution time, and/or don't really perform the same unconditional operation over batches of input data that the cost of developing GPGPU algorithms and passing data and instructions back and forth between the GPU and CPU is probably not worth it, and speedups would be very hard to realize, if possible at all.

This is just my (hopefully somewhat informed?) take on why there are no good "plug this GPU thing to make OxiPNG 200% faster" things out there yet, and it's unlikely there will ever be. Of course, other people are welcome to chime in.

Sep 29 '23 19:09 AlexTMjugador

Just to add to the above points:

Oxipng uses the separate libdeflate and zopfli libraries for deflate compression, which takes up the vast majority of optimisation time. Libdeflate in particular is highly optimised for performance, much faster than zlib used in the study @AlexTMjugador mentioned and likely outperforms their GPU implementation. But the point is that oxipng can't do anything in this area itself (unless we wanted to write our own deflate compressor, and I for one certainly do not 😅).
A smaller part of oxipng's time is spent on filtering and the heuristic algorithms that try to select the best filter. This may or may not be a GPU-friendly task (I'm not really one to know) but I think if we wanted to improve performance here we would likely look first to SIMD optimisations.

Sep 30 '23 01:09 andrews05

It's frustrating to have a PNG that's been "optimized" by any (combination of) tools 1 can throw at it, & then find any basic file compression (even OS-based transparent filesystem-kind) shrinking it further. It begs the question, "What further/else could be done?"

A smaller part of oxipng's time is spent on filtering and the heuristic algorithms that try to select the best filter. This may or may not be a GPU-friendly task (I'm not really one to know) but I think if we wanted to improve performance here we would likely look first to SIMD optimisations.

This is the part I was wondering about, actually. There've been mentions in various issues here & even implementations of better filter choice (pngwolf & fork pngwolf-zopfli are the most thorough I'm aware of w/o being completely random or brute-force, yet are abominably slow), but there's always been the diminishing returns aspect of really trying to find optimal filters for PNG's compression to work best with. Would leveraging whatever hardware we've access to (short of booting up mega-multi-core cloud computation 😉) be helpful here?
I agree that re-implementing deflate here is pointless, &, if any improvements were to be had there, they should definitely be implemented in the libraries we (choose to) rely on.
Is there anything to be gained/backported within the standard PNG format from even further afield ideas like lrzip, precomp, preflate, &c?

Sep 30 '23 04:09 TPS

The pngwolf genetic filter is slow because it tries a huge number of combinations, compressing each one to see which one is best. It's a lot like brute force, except that it's being smart about what combinations to try and what ones to skip. It's necessarily going to be a slow process, again because much of the time is spent on the deflate process.
Tools like precomp etc exist precisely because those algorithms can't be applied within the standard PNG format. So unfortunately there's nothing to be gained there.

I would say the answer to your problem is simply "use a more modern image format", but there is one other potential area where advancements in compression and/or performance may be found: AI.

Here's a recent news article about a paper discussing the usage of AI for lossless data compression, including comparisons with PNG (though the AI isn't actually producing PNGs itself). But this is way out of my league. And I imagine there's little interest from experts in applying AI to older formats because it's much more exciting to explore what the AI can do without being constrained to ancient data specifications. Still, in theory, I'm sure AI could be applied to reorder colour palettes, select filters and construct deflate streams to produce optimised PNGs...

Sep 30 '23 05:09 andrews05

I get the feeling the AI concept is much like throw more hardware (GPU 😅, SIMD, threads, memory) at the problem: until someone actually does a proof-of-concept & then a community develops it further from there, it's just… vaporware.

Sep 30 '23 05:09 TPS

I would say the answer to your problem is simply "use a more modern image format", but there is one other potential area where advancements in compression and/or performance may be found: AI.

Also, my understanding of the article & chart is that the AI did develop some new "modern image format" that's purely custom to the datasets presented, not (necessarily) a general purpose format like anything developed by humans.

But, unless the AI is able to explain it to humans, it might be a "black box" algorithm involved in constructing it (much like current AI themselves) & so we might never know, too.

Sep 30 '23 05:09 TPS

Of interest regarding AI: https://cloudinary.com/blog/jpeg-xl-and-automatic-image-quality

Nov 17 '23 18:11 andrews05

Interesting in that this AI use seems to be for more efficient use of lossy compression.

Nov 17 '23 19:11 TPS

oxipng oxipng copied to clipboard

Better use of local hardware?

oxipng
oxipng copied to clipboard