python-soundfile icon indicating copy to clipboard operation
python-soundfile copied to clipboard

unexpected 'normalization' of integer data

Open gbeckers opened this issue 3 years ago • 3 comments

I was aware that normalization from int to float and back is something to think about and does have consequences that may be unintended. What is surprising to me is that int dtypes appear to be 'normalized' as well if you write to bit depths that are different from the two that soundfile supports on the Python side, int16 and int32. This is especially important to be aware of when writing the often used PCM_24 and PCM_16, but also when writing to 8-bit formats. That is, the value 10 in your numpy array will not be written as 10 when writing to PCM_U8. Example is given below.

import soundfile as sf
print('soundfile version', sf.__version__)
print('libsndfile version', sf.__libsndfile_version__)
ar = np.array([[0], [10],[128], [255], [511], [512], [513]], dtype='int32')
sf.write('test.wav', ar, 44100, 'PCM_24')
data, fs = sf.read('test.wav', dtype='int32')
print(ar.flatten())
print(data)

soundfile version 0.9.0 libsndfile version 1.0.28 [ 0 10 128 255 511 512 513] [ 0 0 0 0 256 512 512]

I can see why this is so, but my expectation would be that the values are written as such, and clipped when you have values in your int32 array that are higher/lower than the max/min of an int24.

Not saying the current behavior is wrong, but it may be good to mention in the docs that a type of normalization takes place, also when strictly using ints for in and output.

gbeckers avatar Aug 06 '20 11:08 gbeckers

Yes, this is a somewhat confusing property of libsndfile. Would you like to contribute the doc change you mentioned as a pull request?

bastibe avatar Aug 11 '20 06:08 bastibe

I agree that this can be confusing, and it would be good to explain this in the docs, but I think there are very good reasons to do it this way.

You can think about it like this: There are only two integer formats supported by the soundfile module: int16 and int32. Each of those has their respective minimum and maximum values.

The main point is: a signal with maximum amplitude should always lead to the same playback volume, regardless which data type you are using. If this were not the case, playback of 32-bit PCM files would be extremely loud, while playback of 16-bit PCM files would be barely audible.

I think this would be more confusing than the current behavior, but on top of that it would also be prone to destroying loudspeakers and ears.

The handling of PCM_24 and PCM_U8 is just a logically consistent consequence of this behavior. Each format has a given maximum and minimum value, and the values are mapped appropriately from/to the maximum/minimum values of int16/int32.

If you do actually want to keep the same numerical values, you'll have to scale the values yourself. This is normally very simple and fast, most of the time it's just a shift by 8 bits. For an example, see https://github.com/bastibe/SoundFile/issues/263#issuecomment-610386841.

mgeier avatar Aug 18 '20 12:08 mgeier

Indeed I can see the rationale behind it, and for most use cases this is the best option. It could be documented in more depth what happens to numpy integers when they are saved as PCM, but it's a gotcha probably only for a small minority of users. I can help with some documentation but not within the few next weeks. Thanks for providing a very nice library!

gbeckers avatar Aug 18 '20 19:08 gbeckers