Resonite-Issues icon indicating copy to clipboard operation
Resonite-Issues copied to clipboard

Replace Compressonator with bc7enc for faster/higher quality textures

Open jonnyawsom3 opened this issue 2 months ago • 7 comments

Is your feature request related to a problem? Please describe.

Currently, using BC7 requires minutes of encoding either locally or via the variant system, this means it takes a very long time to work on high quality assets/models.

Describe the solution you'd like

Initial testing shows bc7enc has 10x faster encode time, 20x lower CPU time and higher quality than Compressonator. This would significantly reduce the cost of encoding textures while making them more responsive. bc7enc also supports BC1-5, all faster and more efficient than Compressonator, with 2x less CPU time than Crunch.

It should be noted, the CLI tool for bc7enc has benchmarks I can't disable, and internally reports 0.87 seconds for Fast and 10 seconds for Best. There's also no comparison images, as they're nearly imperceivable to the naked eye.

4K Texture Results AMD bc7enc Fast bc7enc Best
SSIMU2 (Quality) 93.94 93.86 93.99
PeakWorkingSetSize 153.1 MiB 172.5 MiB 175.8 MiB
PeakPagefileUsage 849.4 MiB 175.2 MiB 178.5 MiB
Real time (Seconds) 151.66 5.33 15.83
CPU time (Seconds) 2318.88 12.92 127.22

With those timings, it's actually faster than Compressonator's BC3 encoding, allowing for direct replacement with BC7 for much higher quality, using the same VRAM and CPU time.

AMD BC3 (1.3 Seconds) bc7enc Fast (0.87 Seconds)
Image bc7enc

Describe alternatives you've considered

Setting textures to direct load/uncompressed on import with a mod, however this means much higher VRAM usage and download size.

Additional Context

This should be assigned to @ProbablePrime as a replacement to #2488, following up on all the work to Compressonator.

Requesters

Jonnyawsom3, draycethevoidangel, yoshiyoshiyoshiyoshiyoshyoshyosh, raidriar796

jonnyawsom3 avatar Sep 29 '25 18:09 jonnyawsom3

I made a fork of bc7enc that has FFI support & a few little efficiency PRs merged in, a very thin C# wrapper around said fork, and a resonite mod that uses said wrappers for encoding the BCn_LZMA textures.

The results are pretty damning. Here's me encoding the 2k and 4k mipmaps of a 4k texture in BC1_LZMA, BC3_LZMA, BC4_LZMA, and BC7_LZMA:

// bc7enc
1:48:16.347: [BC7EncMod] Encoded 2048x2048 texture to BC1 in 0.9145352 seconds.
1:48:20.858: [BC7EncMod] Encoded 4096x4096 texture to BC1 in 3.4698624 seconds.
1:48:38.771: [BC7EncMod] Encoded 2048x2048 texture to BC3 in 0.8500086 seconds.
1:48:44.773: [BC7EncMod] Encoded 4096x4096 texture to BC3 in 2.8852035 seconds.
1:49:10.890: [BC7EncMod] Encoded 2048x2048 texture to BC4 in 0.4732731 seconds.
1:49:13.642: [BC7EncMod] Encoded 4096x4096 texture to BC4 in 1.8550868 seconds.
1:49:28.890: [BC7EncMod] Encoded 2048x2048 texture to BC7 in 3.0741065 seconds.
1:49:42.032: [BC7EncMod] Encoded 4096x4096 texture to BC7 in 11.0810637 seconds.

// Compressonator
1:54:28.390: [BC7EncMod] Encoded 2048x2048 texture to BC1 in 0.3661101 seconds.
1:54:30.625: [BC7EncMod] Encoded 4096x4096 texture to BC1 in 1.210149 seconds.
1:54:50.585: [BC7EncMod] Encoded 2048x2048 texture to BC3 in 0.3623067 seconds.
1:54:55.006: [BC7EncMod] Encoded 4096x4096 texture to BC3 in 1.3783381 seconds.
1:55:24.154: [BC7EncMod] Encoded 2048x2048 texture to BC4 in 0.3122833 seconds.
1:55:26.163: [BC7EncMod] Encoded 4096x4096 texture to BC4 in 1.1044985 seconds.
1:59:08.938: [BC7EncMod] Encoded 2048x2048 texture to BC7 in 158.4507517 seconds.
2:08:52.249: [BC7EncMod] Encoded 4096x4096 texture to BC7 in 581.1169189 seconds.

BC1, BC3, and BC4 are a bit slower than Compressonator, but not by much in the grand scheme of things. They appear to make roughly-the-same-if-not-slightly-higher-quality textures than Compressonator, so the quality can maybe be tuned down slightly for them (though, I don't have a scheme yet to configure encoding parameters in my bindings; it just defaults to the highest quality available for everything)

BC7 is the real magic here. The results generated by bc7enc are higher quality than Compressonator at a fraction of the encoding time. Here's a normal map that was generated with Compressonator:

Image

And here's the same normal map with bc7enc:

Image

You can see that the bc7enc version keeps slightly finer details compared to the Compressonator version.

bc7enc also allows for BC5 encoding, presumably at a fast speed, so that'd be something to look out for when it comes to the potential of using BC5 for normal maps.

Overall, the amount of effort it took for me to make this was like, only 2 days ish. So it'd be cool to see it in the vanilla game.

yoshiyoshyosh avatar Nov 17 '25 20:11 yoshiyoshyosh

This seems promising, but it raises question - how are you comparing the quality of the textures? Is this just visually looking at them?

That'd be my primary concern - is it just way more efficient implementation, or is it taking some tradeoffs? How well it works for wide variety of textures? The ones shown seem pretty simple and typical, but I'd want to see some more complex textures with lots of color variation (e.g. a color noise textures are probably as challenging as it gets)

My other concern is, does bc7enc have full feature parity with what we use from Compressonator?

Frooxius avatar Nov 17 '25 20:11 Frooxius

@yoshiyoshyosh Can you shove some images through: https://github.com/cloudinary/ssimulacra2

There's also: https://github.com/Yellow-Dog-Man/Ssimulacra2.NET

ProbablePrime avatar Nov 17 '25 23:11 ProbablePrime

@Frooxius

how are you comparing the quality of the textures? Is this just visually looking at them?

For me, as of right now, I am just kind of eyeballing it. I do want to make/use a comprehensive texture test suite that has a ton of images of varying styles from the simple to the complex and compare with SSIMU2, but I won't be able to get to that for at least a few days.

What I can tell you for BC7 is that both bc7enc (at max quality--what I am using for all this) and Compressonator (at 0.25 quality--what resonite uses) are very well in the "perceptually lossless" range of quality. Both have imperceptible differences compared to the lossless texture.

When Compressonator is used near its maximum quality (0.6 <= Q <= 1), it will probably beat out bc7enc in numerical quality metrics, but that would come at the expense of several minutes of encoding time (see prime's devlog post back here). Resonite obviously doesn't want that, and as such uses quality 0.25, which still takes a few minutes for 4k textures on my machine. It is also important to stress that the quality increase is imperceptible and the numerical returns are near zero at this level.

is it just way more efficient implementation, or is it taking some tradeoffs?

It is mainly just an extremely efficient implementation. The author takes extreme advantage of CPU vectorization to squeeze all they can while retaining high quality. You can read about it on their blog:

  • https://richg42.blogspot.com/2018/04/a-tale-of-multiple-bc7-encoders.html
  • https://richg42.blogspot.com/2018/04/bc7-showdown-2-basis-ispc-vs-nvidia.html
  • https://richg42.blogspot.com/2018/05/graphing-our-bc7-encoders-quality-vs.html
  • https://richg42.blogspot.com/2021/02/average-rate-distortion-curves-for.html (this is the current "modern" version of the library; the previous ones were old versions)

The author has pretty much dedicated several years to this kind of compression.

My other concern is, does bc7enc have full feature parity with what we use from Compressonator?

For one, bc7enc only supports BC1, BC3, BC4, BC5, and BC7. It does not support BC2, BC6, any form of "crunch" compression after doing block compression, or any other texture compression format (ETC, ASTC). crunch will still need to stick around for crunch compression, and Compressonator will still need to stick around for BC6/ETC/ASTC texture compression.

In terms of other features, it seems like Resonite doesn't use much stuff that Compressonator specially gives you really. While Compressonator has native mipmap support and bc7enc does not, Resonite doesn't use Compressonator's mipmap generation and instead generates its own mipmaps.

One thing that bc7enc currently does not have is the ability to directly read from a pointer and write to a pointer. This causes two unnecessary copies of texture data within the native library. I plan to add support for this in bc7enc myself, though.

Everything else just seems to be compression tuning options, which doesn't really matter in the grand scheme of things as the ends (texture quality) justify the means here.

@ProbablePrime

Can you shove some images through: https://github.com/cloudinary/ssimulacra2

Yeah, I already had this downloaded and plan to do so. Again, just give me a bit.

Do let me know if you have any more questions/concerns.

yoshiyoshyosh avatar Nov 17 '25 23:11 yoshiyoshyosh

I included a sample of my SSIMULACRA2 metric scores at the top for Compressonator and both 'modes' of bc7enc's BC7 compression. Either it's very slightly worse (-0.08 is tiny) and 30x faster, or slightly higher quality and 10x faster. This applied to around a dozen different textures I tried.

The color noise texture is a good idea, BC formats simply don't have enough precision to preserve them well, so we'll be able to see the worst case of both libraries. I'll run some tests later and upload the results, trying all BC formats on the noise for preliminary results.

As Yosh said, currently Compressonator already takes minutes at the minimum viable quality threshold. bc7enc should allow much more control, while already being an order of magnitude faster even at it's slowest setting.

Crunch may not be needed, as bc7enc includes RDO optimisation, allowing a trade off of quality to LZMA compressed size (for all it's formats, not just BC1 and BC3 like Resonite currently has). I'll have to run more tests and comparisons first though. Edit: Hard to tell. First results look worse than Crunch at the same size, but there are parameters to tweak I haven't explored.

We may have found a replacement for BC6H and ASTC too. Intel has a library that handles both, presumably at much higher speed/quality like bc7enc but again, more testing needed.

I agree that testing a big corpus would be best, but due to a lack of time I'll just try the noise texture first. The sheer speed of BC7 opens up new possibilities too, like BC7 normal maps and transparent textures, at both higher quality and speed than current.

Edit: Yosh took care of the benchmarks, so I'll just continue to give advice on the esoteric ways of block compression.

jonnyawsom3 avatar Nov 18 '25 01:11 jonnyawsom3

@Frooxius @ProbablePrime

I ended up making a mini test suite thingy last night anyway. teehee

https://github.com/yoshiyoshyosh/bc7enc_compressonator_comparison

I used all the images in the compressonator.NET resources directory, along with a couple extra ones that you can see in the input/README.md file. Though, you'll need to download the release for the test images because of large-file git stuff that I didn't want to deal with.

My results for it all are in output_raw.log and output_clean.log. It's pretty clear that bc7enc produces similar-if-not-higher quality BC7 textures than Compressonator 0.25 at a fraction of the encoding time, even for the absolute worst case scenario (noise_extreme_2.png is just a complete random 4k color noise texture). It even has a "perceptual" flag to do the compression in perceptual colorspace (for BC7 only) that is more accustomed to what humans perceive rather than pure numbers. Using this flag results in a consistently higher SSIM score than Compressonator 0.25 on any kind of "standard" texture (i.e. one that would realistically be used for an asset in-game).

It also squeezes out slightly higher quality BC1 and BC3 textures, though at the cost of ~2-3x the encoding time. However, the times for encoding BC1/BC3 are already so small that it's not that big of a difference in the grand scheme of things. As jonny said, bc7enc also has "RDO" which encodes the texture in a way much friendlier for LZMA to compress at the cost of quality, much like crunch. Preliminary testing from the discord on whether this is better is still going on; it doesn't seem to be much of a benefit, if any, over crunch in file size from what we're seeing so far though.

The SSIM metrics should probably be taken with grains of salt when it comes to the normal maps, as those encode data rather than visuals. You wouldn't want to use perceptual for those. bc7enc allows you to encode BC5, which can be used for theoretically higher-quality normal maps, but Resonite currently doesn't support BC5 yet :(

In terms of emperical testing as well: me, Raidriar, jonny, Hayden, and Cyro at least have all been consistently using this mod for BC7 encoding without issue, and every texture they've tried have all been perceptually lossless. We're all extremely happy with it as far as I know.

Overall, I hope this is enough to convince you two that this library is extremely powerful and worth using over Compressonator for at the very least BC7 compression. It'd also be cool to see it used for all the BCn_LZMA formats (excluding BC6, as bc7enc doesn't support BC6 💔). I can probably mess around with some quality options to see if I can match the BC1/3 encoding speed of Compressonator if you want.

Let me know about any more questions/concerns!

yoshiyoshyosh avatar Nov 18 '25 18:11 yoshiyoshyosh

small update: I added BC1/3 quality changing support to the .net bindings and also realized I forgot a sw.Reset() when doing my initial benchmark, making the BC7 non-perceptual time longer than it actually was.

I did another benchmark and have the log here, as well as trying out quality 14 instead of 18 for BC1/3. that results in roughly the same speed as compressonator for BC1/3 while losing almost no SSIM from quality 18, still being much better than compressonator in all cases.

yoshiyoshyosh avatar Nov 21 '25 23:11 yoshiyoshyosh