Consider accelerated IDCT?
I would imagine that with IDCT accelerated, Jpeg->YCbCr unscaled planar performance should meet/exceed libjpeg-turbo (it hasn't merged these yet - I think these handle multiple blocks).
https://github.com/huaqiangwang/libjpeg-turbo/blob/avx2-dev/simd/jidctflt-avx2-64.asm https://github.com/huaqiangwang/libjpeg-turbo/blob/avx2-dev/simd/jfdctflt-avx2-64.asm https://github.com/huaqiangwang/libjpeg-turbo/blob/avx2-dev/simd/jfdctmflt-avx2-64.asm
Patches for SIMD acceleration are welcome, I don't have the time to implement it myself.
Is this something you would be able to delegate or work on with additional funding?
For the forseeable future the project is mainly limited by dev head count/time. Delegation is not a problem as long as the changes are properly licensed. Porting libjpeg-turbo's SIMD implementation, which seems to be zlib licensed, sounds reasonable. We'd have to figure out how best to test the combination of simd/no-simd and the rest of the code but I don't see any blocking concerns there either.
I'm considering using this project in https://github.com/imazen/imageflow as the default jpeg decoder, but I expect I will need to expand things like color profile/exif support and add assembly and C to the build to optimize performance-critical bits. I'd like to make sure these changes align with the project direction before investing.
Ideally, I'd like to locate someone who can work on this codec full-time for a while and bring it up to parity with libjpeg-turbo (which is not usable in Rust due to setjmp usage).
Ideally, I'd like to locate someone who can work on this codec full-time for a while and bring it up to parity with libjpeg-turbo (which is not usable in Rust due to setjmp usage).
That would be awesome. As for governance, this crate in particular was previously a separate entity. It moved to the image-rs organization to maintain and we're open to collaborators and I have no problem with another developer in a lead role per-se.¹
Adding C dependencies or large amounts of unsafe, however, would be controversial and require good arguments for why it is necessary. Adding assembly as a performance optimization, as long as these bits stay optional, is okay. It would be preferrable to use simd intrinsics of Rust but that's an opinion that can be changed with a convincing performance argument. We'd just need to figure out a way to avoid the incompatibility concerns/symbol collisions that come with linking through native interface (e.g. ring` has gone this route and using different versions simply doesn't work).
¹Random thought: Maybe Embark Studios would also be interested and/or have someone in mind?
I believe the use case for C would be the optimized huffman decoder (probably optional), but for assembly I'd like to keep the files in sync with the audited codebase of libjpeg-turbo if possible. https://github.com/libjpeg-turbo/libjpeg-turbo/tree/master/simd/x86_64
I'm not sure how to resolve multi-versioning with needing to use NASM specifically. edit: perhaps a preprocessing build step could mangle the function names?
Even before re-implementing it in assembler, I think there is a lot that can be done to improve the performance of the existing IDCT code in safe rust.
I am in no way a specialist, but a quick look at the generated assembly lets me think it is far from optimal. There must be a way to make the compiler use at least some simd instructions by changing how we organize statements there.
There was a similar (well, somewhat similar) code converting between color representation in image/webp. There are likely some lessons that can be learned from this, in particular about when Rust/llvm is able to elide bounds checks on slice access and consequently able to automatically vectorize simple loops.
After 4 years I finally was able to overcome the blocking issue I had with using libjpeg-turbo, so I probably won't be using this as the primary decoder, but I think I will keep it as a backup. Thanks!