jpeg-decoder icon indicating copy to clipboard operation
jpeg-decoder copied to clipboard

Consider accelerated IDCT?

Open lilith opened this issue 8 years ago • 9 comments

I would imagine that with IDCT accelerated, Jpeg->YCbCr unscaled planar performance should meet/exceed libjpeg-turbo (it hasn't merged these yet - I think these handle multiple blocks).

https://github.com/huaqiangwang/libjpeg-turbo/blob/avx2-dev/simd/jidctflt-avx2-64.asm https://github.com/huaqiangwang/libjpeg-turbo/blob/avx2-dev/simd/jfdctflt-avx2-64.asm https://github.com/huaqiangwang/libjpeg-turbo/blob/avx2-dev/simd/jfdctmflt-avx2-64.asm

lilith avatar Sep 13 '17 22:09 lilith

Patches for SIMD acceleration are welcome, I don't have the time to implement it myself.

kaksmet avatar Sep 15 '17 19:09 kaksmet

Is this something you would be able to delegate or work on with additional funding?

lilith avatar Apr 09 '20 06:04 lilith

For the forseeable future the project is mainly limited by dev head count/time. Delegation is not a problem as long as the changes are properly licensed. Porting libjpeg-turbo's SIMD implementation, which seems to be zlib licensed, sounds reasonable. We'd have to figure out how best to test the combination of simd/no-simd and the rest of the code but I don't see any blocking concerns there either.

197g avatar Apr 09 '20 15:04 197g

I'm considering using this project in https://github.com/imazen/imageflow as the default jpeg decoder, but I expect I will need to expand things like color profile/exif support and add assembly and C to the build to optimize performance-critical bits. I'd like to make sure these changes align with the project direction before investing.

Ideally, I'd like to locate someone who can work on this codec full-time for a while and bring it up to parity with libjpeg-turbo (which is not usable in Rust due to setjmp usage).

lilith avatar Apr 09 '20 16:04 lilith

Ideally, I'd like to locate someone who can work on this codec full-time for a while and bring it up to parity with libjpeg-turbo (which is not usable in Rust due to setjmp usage).

That would be awesome. As for governance, this crate in particular was previously a separate entity. It moved to the image-rs organization to maintain and we're open to collaborators and I have no problem with another developer in a lead role per-se.¹

Adding C dependencies or large amounts of unsafe, however, would be controversial and require good arguments for why it is necessary. Adding assembly as a performance optimization, as long as these bits stay optional, is okay. It would be preferrable to use simd intrinsics of Rust but that's an opinion that can be changed with a convincing performance argument. We'd just need to figure out a way to avoid the incompatibility concerns/symbol collisions that come with linking through native interface (e.g. ring` has gone this route and using different versions simply doesn't work).

¹Random thought: Maybe Embark Studios would also be interested and/or have someone in mind?

197g avatar Apr 10 '20 16:04 197g

I believe the use case for C would be the optimized huffman decoder (probably optional), but for assembly I'd like to keep the files in sync with the audited codebase of libjpeg-turbo if possible. https://github.com/libjpeg-turbo/libjpeg-turbo/tree/master/simd/x86_64

I'm not sure how to resolve multi-versioning with needing to use NASM specifically. edit: perhaps a preprocessing build step could mangle the function names?

lilith avatar Apr 10 '20 16:04 lilith

Even before re-implementing it in assembler, I think there is a lot that can be done to improve the performance of the existing IDCT code in safe rust.

I am in no way a specialist, but a quick look at the generated assembly lets me think it is far from optimal. There must be a way to make the compiler use at least some simd instructions by changing how we organize statements there.

lovasoa avatar Apr 10 '20 18:04 lovasoa

There was a similar (well, somewhat similar) code converting between color representation in image/webp. There are likely some lessons that can be learned from this, in particular about when Rust/llvm is able to elide bounds checks on slice access and consequently able to automatically vectorize simple loops.

197g avatar Apr 10 '20 18:04 197g

After 4 years I finally was able to overcome the blocking issue I had with using libjpeg-turbo, so I probably won't be using this as the primary decoder, but I think I will keep it as a backup. Thanks!

lilith avatar Apr 10 '20 23:04 lilith