python-soundfile
python-soundfile copied to clipboard
libsndfile (soundfile) for mp3 not float32 but float64
https://github.com/librosa/librosa/issues/1584 Audio out from libsndfile (soundfile) for mp3 not float32 but float64. Because of this, if we do not force the dtype=float64 , we get an empty array
print mp3 with dtype set float32 by default audio_test, _ = librosa.load('./g.mp3', mono=False, res_type='kaiser_fast',sr=sr)
[]
0
(2, 0)
tensor([], size=(2, 0))
print mp3 with dtype =float64 audio_test, _ = librosa.load('./g.mp3', mono=False, res_type='kaiser_fast',sr=sr, dtype=np.float64)
[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]
21182464
(2, 10591232)
tensor([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])
mp3 file for test abba
If reading directly from soundfile with dtype=float32 , we will get empty array. And with dtype=float64 we will get filled array.
import soundfile as sf
with sf.SoundFile(audio_path, "r") as file_:
frames = file_._prepare_read(0, None, -1)
#dtype = "float64"
dtype = "float32"
waveform = file_.read(frames, dtype, always_2d=True)
sample_rate = file_.samplerate
print(sf.info(audio_path, verbose=True))
print(sf.available_subtypes(format=None))
print(sf.available_formats())
print (waveform, sample_rate)
print(file_.subtype)
I can confirm this behavior for this particular file. It does not happen for other files, however. Indeed it does not happen if you save the float64 data as a new MP3 file and try to read that.
I'm afraid this is a libsndfile problem, and there's nothing soundfile can do about it. But please let me know if I'm mistaken on that.
Hi thank You for Your answer. I've already come across a bunch of mp3 files with the same behavior. And I downloaded these different mp3 files from different sources. Also I even converted the audio file to mp3 from a video editor SonyVegas and got the same problem. Maybe force dtype=float64 until this problem is fixed? Since this often causes problems for libraries (for example librosa, pytorch audio and e.t.c.) that use the soundfile for mp3 loading.
https://github.com/libsndfile/libsndfile/issues/880#issuecomment-1264130930
That's good to hear! As soon as they libsndfile release a new version (and the build systems catch up, so we can actually use them), we'll push an update to soundfile as well.
I ran into this issue as well when I realized it was the root cause for a librosa issue. Looking forward to its resolution.
@bastibe Do you have any news for this open issue?
Not yet, sorry. If you want to help, head on over to https://github.com/bastibe/libsndfile-binaries/tree/manylinux-binaries and help me adjust the CI scripts to build updated binaries.
Not yet, sorry. If you want to help, head on over to https://github.com/bastibe/libsndfile-binaries/tree/manylinux-binaries and help me adjust the CI scripts to build updated binaries.
I just had a look at the branch, but it's not clear what exactly you want adjusted.
I tried bumping the version of libsndfile and manually triggered the GH action. Everything ran smoothly and the binaries were saved as an artifact. Do you want the workflow to automatically commit the new binaries to the branch?
If that's all that it takes, I'll gladly merge a pull request with the new version numbers. Thank you!
Sorry, my life has been terribly busy lately, not much time left for OSS work.
If that's all that it takes, I'll gladly merge a pull request with the new version numbers. Thank you!
Sorry, my life has been terribly busy lately, not much time left for OSS work.
No need to apologize, I'm happy to help you out. Let's continue here: https://github.com/bastibe/libsndfile-binaries/pull/20
A new release of python-soundfile is in testing now. Please check out https://github.com/bastibe/python-soundfile/pull/364 and see if it fixes your issue.
I'm trying to use whisperX application that depends on pyannote-audio and underlying torchaudio. Goes without saying soundfile is a dependency. Earlier I had an issue where mp3 files were not being recognized and that was because my soundfile version was out of date. I updated to version 0.12.0 (pyannote-audio currently has the requirement's maximum version to no more than 0.12) and I was able to process mp3s with no issue.
However, most of audio is in the m4a format. Currently I'm converting my audio files to mp3s so that I can process them with whisperx; however, it would be nice to have support for m4as without needed to convert my audio files everytime. Is there anything I can do to help mitigate this?
Format support in soundfile is entirely up to libsndfile. As far as I know, there are currently no plans to support AAC/MP4/M4A audio files in libsndfile, as they seem encumbered by patents.
Please follow https://github.com/libsndfile/libsndfile/issues/389 for more info on support for AAC in M4A (MP4) containers. Patents should not be an issue (anymore).
That's terrific news! Thank you for sharing.