Sergey "Shnatsel" Davidoff comments

Results 943 comments of


                                            Sergey "Shnatsel" Davidoff

Parallelization opportunities

I think I've overcomplicated parallelizing animations. **Having just two threads - one for decoding, one for compositing - is going to be almost as good as it's going to get.**...

Decoding animated WebP is 4x slower than `libwebp-sys` + `webp-animation`

I looked at the profile and the associated code a bit. The two low-hanging optimization opportunities are: 1. Applying the YUV->RGB optimization from #13 to the [RGBA codepath](https://github.com/image-rs/image-webp/blob/ecead22637f625a144830aff6c05b02d185a5d00/src/vp8.rs#L921-L944) as well...

Decoding animated WebP is 4x slower than `libwebp-sys` + `webp-animation`

Paper on fast alpha blending without divisions: https://arxiv.org/pdf/2202.02864

Decoding animated WebP is 4x slower than `libwebp-sys` + `webp-animation`

I've attempted to optimize alpha blending by performing it in u16 instead of f64. I got the primitives working (rounding integer division both by 255 and by an arbitrary u8)...

Decoding animated WebP is 4x slower than `libwebp-sys` + `webp-animation`

Thank you! I'll benchmark that and dig deeper into the performance of these things once we actually have a working alpha blending routine. Right now I'm not even sure if...

Decoding animated WebP is 4x slower than `libwebp-sys` + `webp-animation`

Okay, I checked how libwebp does it, and they actually do it in `u32` rather than `u16`: https://github.com/webmproject/libwebp/blob/e4f7a9f0c7c9fbfae1568bc7fa5c94b989b50872/src/demux/anim_decode.c#L215-L267 We should probably just port that.

Decoding animated WebP is 4x slower than `libwebp-sys` + `webp-animation`

I've ported the libwebp algorithm. It is really inaccurate at low alpha levels but nobody is going to notice that anyway. It gives a 8% end-to-end performance boost on this...

Decoding animated WebP is 4x slower than `libwebp-sys` + `webp-animation`

I turned an `assert!` into a `debug_assert!` and that must have unlocked some huge optimizations because decoding is now 16% faster end-to-end, so the alpha blending function must be ~5x...

Decoding animated WebP is 4x slower than `libwebp-sys` + `webp-animation`

@awxkee I've replaced libwebp's division approximation with your `div_by_255` and got improved precision without sacrificing performance! Combined with the `image_webp::vp8::Frame::fill_rgba` optimization in #122, we're now 27% faster end-to-end on this...

Decoding animated WebP is 4x slower than `libwebp-sys` + `webp-animation`

That method results in a less precise approximation of the floating-point division, and I'm seeing a greater divergence from the floating-point reference. I believe the trick with the other `div_by_255`...