godot icon indicating copy to clipboard operation
godot copied to clipboard

Add QOA (Quite OK Audio) as a WAV compression mode

Open DeeJayLSP opened this issue 1 year ago • 7 comments

This is an alternative (a better one in my opinion) to #88646, with all caveats from it nullified. Closes godotengine/godot-proposals#9133 too.

Once again, QOA was developed to (not exactly just this, but is the best case) be a better alternative to ADPCM formats for use in games according to the article announcing it.

Production release templates, for some reason, had a binary size decrease of 4064 bytes on Linux.

The patches in qoa.h suppresses a few warnings, allow editor to build (due to implementation being applied in both importer and stream) and reduce binary size penalty (it would be a bit over 4KiB otherwise).


This simply adds QOA as a compression mode within AudioStreamWAV:

image

Briefly, the differences between IMA-ADPCM and QOA within AudioStreamWAV should be:

+ IMA-ADPCM distorts lots of sound types, specially higher frequencies. The maximum QOA will do is add a barely audible white noise to higher frequencies.

+ IMA-ADPCM isn't resampled on playback, which means sounds different than the project's mix rate will get incredibly distorted. QOA doesn't use prediction when fetching decoded samples, so it can be resampled.

+ Since IMA-ADPCM decoding uses prediction, only Forward loop mode is available. While QOA does use prediction too, it does within frames that are decoded to a buffer and fetches samples from that, so all loop modes can work. Resampling had to be adapted to avoid some unnecessary decode callbacks.

- QOA is slightly more complex than IMA-ADPCM, which should result in increased CPU usage. Despite this, it's still much faster than MP3 and Vorbis.

Supersedes #88646. Unlike it, QOA files can't be used (which shouldn't be a problem, as IMA-ADPCM WAVs could never be used either).

DeeJayLSP avatar Apr 22 '24 13:04 DeeJayLSP

I really like this approach! (can we call this non-destructive asset management?)

But at this point I wonder why we can't have the best of both worlds and also allow importing plain QOA files, basically merging this and the superseded PR. Sorry, was there a reason for that? I can't find it in the old thread. There seemed to be a pretty good consensus there.

deralmas avatar Apr 22 '24 22:04 deralmas

I really like this approach! (can we call this non-destructive asset management?)

But at this point I wonder why we can't have the best of both worlds and also allow importing plain QOA files, basically merging this and the superseded PR. Sorry, was there a reason for that? I can't find it in the old thread. There seemed to be a pretty good consensus there.

Currently there is no common software capable of converting audio files into QOA for distribution. Therefore few people would use it.

Also, according to the format's creator, QOA is meant to be embeddable, and so I believe it made more sense to make it a WAV compression mode instead of another AudioStream type.

If there's demand, QOA files could be allowed to be imported in the future. IMA-ADPCM never had that demand.

DeeJayLSP avatar Apr 23 '24 00:04 DeeJayLSP

@DeeJayLSP I see, this makes perfect sense, thanks for clearing things up!

deralmas avatar Apr 23 '24 00:04 deralmas

The amount of workarounds I'm having to do for the sake of an optimal resampling...

DeeJayLSP avatar Apr 24 '24 05:04 DeeJayLSP

I explained this a few times but I wanted to leave a definitive explanation for the workarounds.

QOA frames are composed of 5120 samples. The moment it begins playing, or when it goes from 5119 to 5120 (or similar intervals), the decoder is triggered, and a buffer large enough for 5120 samples gets replaced with the new frame data.

PCM8/16's resampling works by interpolating the current sample with the next. And this is a problem if we're trying to interpolate a backwards playback or a different sampling rate.

On a backwards playback, the following situation could occur:

// Previous sample: 5120
- Fetch sample 5119, interpolate with 5120
- Fetch sample 5118, interpolate with 5119

It crossed the 5120 interval 3 times, therefore the decoder would be triggered 3 times for two samples in a row.

The solution was simple: fetch samples backwards. Add 1 to the current position, then reverse the from/to assignment. Of course, only when playing backwards.

This led to a side effect that when going from first to last sample in a backward loop, it would try to interpolate last with first, resulting in a pop. A solution was to simply return the last sample whenever it requests a sample beyond the length.

Due to the way resampling it works, some samples end up repeating, with different fractions. If the repeated sample happens to be the last one in a QOA frame (guaranteed if the audio's mix rate is half the project's) the same problem above would occur.

- Fetch sample 5119, interpolate with 5120
- Fetch sample 5119, interpolate with 5120 // Again but with a different fraction

The solution I came up with was to store the current sample, then return it back if the next request is for the same instead of repeating the whole process of diving into checks, which causes the problem. I don't think this is the best solution but at least solves the problem.

There might be ways to optimize this. In a scenario where resampling isn't used, QOA decoding could be just this.

DeeJayLSP avatar Apr 24 '24 06:04 DeeJayLSP

Is it normal for binary sizes to decrease by over 4KB after implementing a feature like this?

(I did test to see if it would work)

DeeJayLSP avatar Apr 29 '24 08:04 DeeJayLSP

For a while now I'm unable to find ways to optimize/fix potential problems in this implementation, so this is the part I say I believe it's in the best state.

DeeJayLSP avatar May 02 '24 00:05 DeeJayLSP

Thanks!

akien-mga avatar May 02 '24 10:05 akien-mga