jpeg-decoder icon indicating copy to clipboard operation
jpeg-decoder copied to clipboard

Change color_convert_line_cmyk so it gets auto vectorized

Open Bond-009 opened this issue 6 years ago • 9 comments

Finally came back to this PR, looking back into it changing the code so the compiler can auto vectorize seemed like the best solution

Comparing the generated asm

Bond-009 avatar Sep 14 '19 12:09 Bond-009

Could you elaborate a bit more on this change? I'd be hesitant to add platform specific unsafe SIMD intrinsics unless there was a substantial performance improvement, and no alternative way to get the same speedup.

fintelia avatar Nov 23 '19 19:11 fintelia

Maybe this function could be vectorized in a platform-independent manner by applying a bitwise NOT on u32 values instead ?

lovasoa avatar Nov 24 '19 20:11 lovasoa

Given how simple the loop is, it is definitely possible to get the compiler to autovectorize it. However there is a simpler question: is this function actually a performance bottleneck at all? I'd strongly suspect that the JPEG decoding is much more costly than a single pass over the image data

fintelia avatar Nov 24 '19 21:11 fintelia

@fintelia : This crate has no benchmark, making it difficult to take optimization decisions. What would you think about a PR adding one or two benchmarks, and adding cargo bench to the CI ?

lovasoa avatar Nov 25 '19 09:11 lovasoa

@lovasoa That would be awesome to have

fintelia avatar Nov 25 '19 20:11 fintelia

I added a benchmark in #122

lovasoa avatar Nov 25 '19 23:11 lovasoa

The function in question is now also doing shuffling (i.e. it gathers results from 4 vectors, each with its own component, and interleaves them in the output).

https://github.com/image-rs/jpeg-decoder/blob/09a2c2d81c89014f1477a2507b20f5fa9e3d8e01/src/decoder.rs#L1280-L1296

The shuffling is necessary because parallelization is per-component.

So this optimization is no longer applicable.

Shnatsel avatar May 13 '22 16:05 Shnatsel

It might be worth considering splitting the shuffling and the actual color conversion, so that the color conversion coukd still be vectorized.

Shnatsel avatar May 13 '22 16:05 Shnatsel

I've scanned ~10,000 JPEGs and failed to find any file that would trigger this codepath, so I'm just going to assume this is a rarely taken codepath that's not really worth micro-optimizing.

Shnatsel avatar May 13 '22 18:05 Shnatsel