python-soundfile icon indicating copy to clipboard operation
python-soundfile copied to clipboard

Soundfile read/write wav is not symmetric with default arguments

Open jon-petter opened this issue 2 years ago • 3 comments

I came across some unexpected behavior in soundfile (version 0.12.1) read/write.

If you have the following float array:

import numpy as np
data = np.array([0, -1, 0, 1, 0, -1, 0, 1, 0, -1, 0], dtype=np.float32)
print((data*(2**15 - 1)))

[     0. -32767.      0.  32767.      0. -32767.      0.  32767.      0.      -32767.      0.]

If you now write and read this data to a wav file using soundfile write and read (with default arguments), you get:

import soundfile as sf
sf.write('test.wav', data, 44100)
data, sample_rate = sf.read('test.wav')
print((data*(2**15 - 1)))

[     0.         -32767.              0.          32766.00003052
      0.         -32767.              0.          32766.00003052
      0.         -32767.              0.        ]

So my pure, max amplitude sin wave has now been reduced in amplitude, and a tiny DC offset has been introduced.

I understand that, when writing to PCM16, there would be quantization artifacts, but I was not expecting the positive and negative sides of the signal to be scaled differently (to this extent).

Is this scaling applied in soundfile code, or in some of the libs it builds upon?

My main question is why this asymmetric scaling is not reversed when using soundfile.read() with dtype="float64"?

jon-petter avatar Oct 16 '23 08:10 jon-petter

This is the unfortunate reality of integer numbers. The lowest possible 16-bit number is -2^15, but the highest possible is 2^15-1. When dealing with float inputs, you have to apply some scaling, and there is no correct answer.

  • Do you scale positive numbers differently from negative numbers? There will be (tiny) discontinuities at the zero crossings.
  • Do you scale to 2^15-1? Then you lose one value for negative numbers.
  • Do you scale to 2^15? Then you lose one value for positive numbers.

There's no right answer. But in reality, the differences between these is imperceptible.

Soundfile does not implement this, but merely passes the data on to libsndfile, which does the transformation.

If you need a perfect float representation, you could always use a native float format, such as MAT5, or (IIRC) Flac or WAV with the FLOAT subtype.

bastibe avatar Oct 19 '23 06:10 bastibe

Yes. I understand that. I was mostly wondering why the scaling is different on write and read, but it is a problem with libsndfile then?

At least, this is the behavior I observe:

  • Write: different scaling factor for negative and positive values
  • Read: Equal scaling factor for all values

Anyhow, I understand that I'm complaining about a 1/2**15 max quantization error vs a1/2**16 max error, and these differences, as you say, are probably imperceptible.

jon-petter avatar Oct 19 '23 07:10 jon-petter

The problem is not that read and write are different, but that +1 is not representable. If you use values <1, it should be symmetric.

bastibe avatar Oct 21 '23 07:10 bastibe