hound Should I normalize when reading int as f32?

Hi, thanks for making a great crate.

I am new to Rust so I may have misunderstood something. This is abount the feature that was added in https://github.com/ruuda/hound/pull/37 .

When a wav in f32 format is loaded, a number in the range -1.0~+1.0 is expected. However, when reading a wav in 16bit Int format as f32, a number in the range -32678.0~+32767.0 is returned. When reading as f32, I think it is common to return a range of -1.0~+1.0 with normalization. If you think it would be better to normalize, I can help.

Aug 14 '22 13:08 AkiyukiOkayasu

The reader returns the raw data from the file. What I would do for this case is match on the sample format, and apply a normalizing factor when it’s an integer format. That factor will depend on the bit depth of the input file.

Aug 14 '22 15:08 ruuda

Yes, I understand how to normalize.
but shouldn't WavReader.samples::() return a normalized result?

i16 to f32 conversion is essentially expected to convert from a fixed-point number to a floating-point number, I think.
There is no data change in the conversion.

For example, when you execute the following code, what you would expect is that a number in the range -32768~+32767 would be converted to -1.0~+1.0.
The current implementation returns -32768.0~+32767.0 that is cast to f32.
https://github.com/ruuda/hound/blob/02e66effb33683dd6acb92df792683ee46ad6a59/src/read.rs#L1361-L1365

Aug 14 '22 16:08 AkiyukiOkayasu

Specifically, the following changes are needed. https://github.com/AkiyukiOkayasu/hound/commit/ad25cd54d5d7a90ed2c395ae9699e5c2b09e29ca

This is a destructive change, but fortunately it is not published in crate.io and only exists in the master branch. I think this is a good time to make this change, but would like to know your thoughts.

Aug 14 '22 16:08 AkiyukiOkayasu

but shouldn't WavReader.samples::() return a normalized result?

I’m not sure, there are good arguments to be made for both sides.

If we scale to [-1, 1] when reading i24 into f32, should we also scale to [-2¹⁵, 2¹⁵] when reading i8 into i16?
If the answer is yes, does that mean we should scale to [-2³¹, 2³¹] when reading i24 into i32?
The latter seems unexpected, I think that’s a bad idea. But then for consistency, we also shouldn’t scale in the other cases.

but would like to know your thoughts.

Now that you pointed it out, I think it may have been a mistake to allow reading integer samples into f32 in the first place, because none of the two behaviors are obviously the right one to me. If we disallow it again, then the user will have to do the cast (and possibly the normalization). It’s a bit more code, but at least the behavior would be obvious.

Aug 14 '22 18:08 ruuda

It is not a matter of which is right, but I would like to express my sense here.

I think that the 0dBFS before conversion should still be expressed as 0dBFS after conversion, whether it is a conversion from int8 to int16, int24 to int32, or even int16 to f32. Scaling is necessary for this purpose. For example, the same behavior occurs when converting with Audacity. However, I think this is too destructive a change. I am not sure if the change should be made now. Perhaps not.

Also, this is essentially a fixed-point to fixed-point or fixed-point to floating-point format conversion. I've thought it would be useful to have support for that functionality as well, but your mention of it makes me question whether it's an issue that hound should support.

Anyway, thank you for taking the time to answer.

Aug 15 '22 05:08 AkiyukiOkayasu