qoa icon indicating copy to clipboard operation
qoa copied to clipboard

Error is huge for this file.

Open photopea opened this issue 4 months ago • 5 comments

I have this mono audio file with 5120 = 256 x 20 samples. The 79th sample is 22528 .

When I encode it into QOA, the values between 74 and 240 all become -32768. Is there a way to avoid it?

My initial history / weights is 0,0,0,0 :::: 0, 0, -8192, 16384.

sound.zip

photopea avatar Aug 12 '25 08:08 photopea

The sample contains values such as ... 320, 22528 ..., so the difference between two samples is over 22,000, but in QOA, the biggest scale factor is 2048, and the biggest qantized scale is 7, so we get the biggest possible difference of only 14,000, which is less than a quarter of the full range of -32,768 ... 32,767

photopea avatar Aug 12 '25 09:08 photopea

This sound file is the worst case for QOA: high frequency, high amplitude and a waveform that is badly predicted by the LMS. Not much you can do about it without changing the sound itself (e.g. maybe lower amplitudes).

The scale factor limit of 2048 not capturing the full range is by design:

https://github.com/phoboslab/qoa/blob/ae07b57deb98127a5b40916cb57775823d7437d2/qoa.h#L193-L198

The assumption, that the LMS will do a good enough job, so that the 2**14 range will suffice, doesn't always hold. As demonstrated by your sound file.

phoboslab avatar Aug 12 '25 10:08 phoboslab

Are you getting the same values as me, i.e. a string of -32768 between indices 74 and 240? I just want to make sure that I have implemented my encoder correctly.

Even if I make sure that a difference between consecutive samples is less than 4,000 at the input, this error still happens (difference between original and encoded samples is over 30,000).

photopea avatar Aug 12 '25 12:08 photopea

Are you getting the same values as me, i.e. a string of -32768 between indices 74 and 240?

No. Using the encoder here produces a somewhat noisy output, but no clicks/pops. The encoder here doesn't just use the scalefactor that produces the lowest error, but also tries to keep the LMS weights down. The ranking for each scalefactor is computed with the squared error + squared weights (if these exceed a certain value). See here: https://github.com/phoboslab/qoa/blob/master/qoa.h#L429-L444

just want to make sure that I have implemented my encoder correctly.

If your encoder produces valid QOA files, than it's probably correct :) The thing is, there's a lot of techniques you can employ to increase the quality of your encoder. Implementing a penalty for large weights, as done here, is one of those.

Noise shaping is another. I experimented with this in a branch here. It helps for some samples, but maybe sounds worse for others (very subjective, though).

If you are okay with sacrificing time/complexity for higher output quality, you could even implement an encoder that looks ahead a number of slices and uses those scalefactors that produce the lowest error over this and bunch of future slices. As an example, the Xtreme Quality ADPCM Encoder implements a lookahead for the MS ADPCM format.

As with many lossy formats, the quality of the encoded files varies with the quality of the encoder. This is different from (many) lossless formats, such as QOI, where there is only one sensible way to encode a file.

phoboslab avatar Aug 12 '25 13:08 phoboslab

I tried to use your "weights penalty", but I am still getting a long sequence of -32768.

Did you look at the actual numbers, or you just tried to play the encoded sound?

photopea avatar Aug 13 '25 08:08 photopea