python-soundfile
python-soundfile copied to clipboard
WAV format with MS_ADPCM subtype always reads back a longer array than what was written
We are running into a funky little issue using soundfile in a different project here:
https://github.com/justinsalamon/scaper/issues/94. Basically, when an audio file is WAV with subtype MS_ADPCM, the shape of the numpy array we write to disk and the shape of the numpy array we read back from disk are not the same.
import soundfile as sf
import numpy as np
import tempfile
_format = 'WAV'
_subtype = 'MS_ADPCM'
sr = 16000
for l in [1600, 16000, 32000, 64000, 100000, 128000]:
original_shape = (l,)
audio = np.zeros(original_shape)
with tempfile.NamedTemporaryFile(suffix='.wav', delete=True) as tmpfile:
sf.write(tmpfile.name, audio, sr, subtype=_subtype, format=_format)
audio, sr = sf.read(tmpfile.name)
print('What I got back from sf.read\t', audio.shape)
print('What I meant to write to disk\t', original_shape)
print()
Running the code above results in the following bizarre set of input and output shapes that I can't make heads or tails of:
What I got back from sf.read (2024,)
What I meant to write to disk (1600,)
What I got back from sf.read (16192,)
What I meant to write to disk (16000,)
What I got back from sf.read (32384,)
What I meant to write to disk (32000,)
What I got back from sf.read (64768,)
What I meant to write to disk (64000,)
What I got back from sf.read (100188,)
What I meant to write to disk (100000,)
What I got back from sf.read (128524,)
What I meant to write to disk (128000,)
I thought I'd open an issue about it. Let me know if you need any more information!
From your examples, I would assume that there is a built-in block size in MS_ADPCM, likely 1012 frames. This is either a built-in assumption of MS_ADPCM, or a limitation of libsndfile. Either way, there is probably nothing you can do within SoundFile short of using a different subtype.
(What process requires MS_ADPCM files? I have never heard of that format.)
Interesting - sounds about right. And neither have I until the bug in the linked issue was reported. Tracking it down led me to this behavior in SoundFile/libsndfile, so I thought I'd just report to keep a record of it somewhere.
We are using audio files from https://freesound.org/. freesound has really no restrictions on the formats or types of audio files you upload. So I'm guessing the file in question that had that subtype came from some sort of niche field recorder or perhaps an old phone or something like that, got encoded in a strange fashion, and then uploaded. But reformatting the file using ffmpeg into a more common format fixed that issue.
Thanks!