jpeg-decoder Change color_convert_line_cmyk so it gets auto vectorized

Change color_convert_line_cmyk so it gets auto vectorized

Open Bond-009 opened this issue 6 years ago • 9 comments

Finally came back to this PR, looking back into it changing the code so the compiler can auto vectorize seemed like the best solution

Comparing the generated asm

Sep 14 '19 12:09 Bond-009

Could you elaborate a bit more on this change? I'd be hesitant to add platform specific unsafe SIMD intrinsics unless there was a substantial performance improvement, and no alternative way to get the same speedup.

Nov 23 '19 19:11 fintelia

Maybe this function could be vectorized in a platform-independent manner by applying a bitwise NOT on u32 values instead ?

Nov 24 '19 20:11 lovasoa

Given how simple the loop is, it is definitely possible to get the compiler to autovectorize it. However there is a simpler question: is this function actually a performance bottleneck at all? I'd strongly suspect that the JPEG decoding is much more costly than a single pass over the image data

Nov 24 '19 21:11 fintelia

@fintelia : This crate has no benchmark, making it difficult to take optimization decisions. What would you think about a PR adding one or two benchmarks, and adding cargo bench to the CI ?

Nov 25 '19 09:11 lovasoa

@lovasoa That would be awesome to have

Nov 25 '19 20:11 fintelia

I added a benchmark in #122

Nov 25 '19 23:11 lovasoa

The function in question is now also doing shuffling (i.e. it gathers results from 4 vectors, each with its own component, and interleaves them in the output).

https://github.com/image-rs/jpeg-decoder/blob/09a2c2d81c89014f1477a2507b20f5fa9e3d8e01/src/decoder.rs#L1280-L1296

The shuffling is necessary because parallelization is per-component.

So this optimization is no longer applicable.

May 13 '22 16:05 Shnatsel

It might be worth considering splitting the shuffling and the actual color conversion, so that the color conversion coukd still be vectorized.

May 13 '22 16:05 Shnatsel

I've scanned ~10,000 JPEGs and failed to find any file that would trigger this codepath, so I'm just going to assume this is a rarely taken codepath that's not really worth micro-optimizing.

May 13 '22 18:05 Shnatsel

jpeg-decoder jpeg-decoder copied to clipboard

Change color_convert_line_cmyk so it gets auto vectorized

jpeg-decoder
jpeg-decoder copied to clipboard