image New DDS decoder

New DDS decoder

Open RunDevelopment opened this issue 8 months ago • 4 comments

I made a new DDS decoder, because the old DXT-based decoder was very limited (only DXT1-5 + dimensions divisible by 4) and incorrect (DXT1 colors were not rounded correctly, resulting in discolorations). While this PR is not finished yet, I already implemented the following features:

Support for DXT1-5 and BC1-5 (including arbitrary image sizes).
Support for all (s)normalized and floating point uncompressed DX10 RGB formats. So most of DXGI_FORMAT.
Support for many DX9 pixel formats. This is done by mapping DDS pixel formats to DXGI_FORMAT (almost; I actually map to a supported formats enum, but this enum closely follows DXGI_FORMAT). This means that there are DX9 pixels formats the new DDS decoder does not, but I couldn't produce/find any DDS files that use them.
Support for non-standard DDS files. Some old and current encoders use custom flags not mentioned in the official docs (e.g. here are the pixel format flags TexConv uses vs the officially documented ones). I used the same flags for TexConv and ignored unknown flags. This allows the new DDS decoder to support more old DDS files with uncommon and non-standard formats.
Support for cubemaps. Cubemaps are displayed in a 4x3 grid that shows the unfolded cube. Example:

The only main formats that are still missing are BC6 and BC7. These 2 are quite complex, so it will likely take me some more time to implement them. Once those 2 are implemented, this should be a pretty competent DDS decoder.

Some notes and technical decisions:

I use &mut dyn Read in all format decoders to reduce binary size. Since there are a lot of DDS formats, there are a lot of functions to decode these formats. I think the binary size could be reduced even further, so I'm very open to feedback in that regard.
I put particular care into ensuring that the decoded colors are correctly rounded. DirectX uses float32 everywhere internally to read DDS files. So to output a 5-bit value as 8-bit, DX does (x as f32 / 31.0 * 255.0).round() as u8 (= first convert to a f32 value in the range 0-1, and then convert to u8 with rounding). Doing this with f32 is quite slow, so I did with only integer operations that are around 3x faster than f32 and around 2x harder to understand. The tricks I use are explained in x5_to_x8 in convert.rs.
I already added the code to read arbitrary surfaces from a DDS file. So reading mip chains, volumes, and image arrays can be easily supported. There's just not much use for it right now aside from cubemaps.
I used Paint.net to determine what DDS files should look like. I did this because I have not found a single DDS file it decoded incorrectly so far, so I used it as my source of truth. Paint.net uses DirectTex (same library that texconv uses) under the hood AFAIK, so everything should be correct.

With what I did so far out of the way, I have some questions to the maintainers on how to integrate this into the image crate:

How should I test this? Used around 20MB of images (>100 files) and tests/reference_image.rs > render_images to ensure that the decoder would correctly, but 20MB is a lot of image data to commit to a repo. I can go down to around 8MB, but not much lower. The issue is that I only have one file for some rare formats and no way to generate smaller images of those formats.
Benchmarking. I have not benchmarked decoding DDS files yet (although it shouldn't be too slow). The current bench setup also seems to be ill-suited for DDS, as I need to benchmark each format individually. How should I go about this?
Should this in a separate crate? I initially wanted to make this a separate crate, but found it vastly easier to add it to the image crate
What should we do with the old DXT implementation? It's not used by the new DDS decoder, and it's deprecated for some time now. Maybe it could be removed after the new DDS decoder is in?
How careful do I have to be with resource limits? Most formats currently read line-by-line into a temporary buffer from which pixels are then decoded. The size of this buffer is proportional to the width of the image. This means that we only need O(sqrt(N)) (N=number of pixels) additional memory for roughly square images. However, an attacker would supply an image with height=1, causing the temporary buffer to as large as (but no larger than) the output buffer. I already limited the width and height of DDS images to be at most 2²⁴, so the temporary buffer can be at most 256MB (with any R32G32B32A32_* format).

Also, (not a question) this my first large-scale Rust PR, so please feel free to pick apart and suggest improvements to everything you see.

Jun 09 '24 13:06 RunDevelopment

image image copied to clipboard

New DDS decoder

image
image copied to clipboard