datasets RuntimeError when using torchaudio 0.12.0 to load MP3 audio file

Current version of torchaudio (0.12.0) raises a RuntimeError when trying to use sox_io backend but non-Python dependency sox is not installed: https://github.com/pytorch/audio/blob/2e1388401c434011e9f044b40bc8374f2ddfc414/torchaudio/backend/sox_io_backend.py#L21-L29

def _fail_load(
    filepath: str,
    frame_offset: int = 0,
    num_frames: int = -1,
    normalize: bool = True,
    channels_first: bool = True,
    format: Optional[str] = None,
) -> Tuple[torch.Tensor, int]:
    raise RuntimeError("Failed to load audio from {}".format(filepath))

Maybe we should raise a more actionable error message so that the user knows how to fix it.

UPDATE:

this is an incompatibility of latest torchaudio (0.12.0) and the sox backend

TODO:

[x] as a temporary solution, we should recommend installing torchaudio<0.12.0
- #4777
- #4785
[ ] however, a stable solution must be found for torchaudio>=0.12.0

Related to:

https://github.com/huggingface/transformers/issues/18379

Aug 01 '22 14:08 albertvillanova

Requiring torchaudio<0.12.0 isn't really a viable solution because that implies torch<0.12.0 which means no sm_86 CUDA support which means no RTX 3090 support in PyTorch.

But in my case, the error only occurs if _fallback_load resolves to _fail_load inside torchaudio 0.12.0 which is only the case if FFMPEG initialization failed: https://github.com/pytorch/audio/blob/b1f510fa5681e92ee82bdc6b2d1ed896799fc32c/torchaudio/backend/sox_io_backend.py#L36-L47

That means the proper solution for torchaudio>=0.12.0 is to check torchaudio._extension._FFMPEG_INITIALIZED and if it is False, then we need to remind the user to install a dynamically linked ffmpeg 4.1.8 and then maybe call torchaudio._extension._init_ffmpeg() to force a user-visible exception showing the missing ffmpeg dynamic library name.

On my system, installing

libavcodec.so.58
libavdevice.so.58
libavfilter.so.7
libavformat.so.58
libavutil.so.56
libswresample.so.3
libswscale.so.5

from ffmpeg 4.1.8 made HF datasets 2.3.2 work just fine with torchaudio 0.12.1+cu116:

import sox, torchaudio, datasets
print('torchaudio', torchaudio.__version__)
print('datasets', datasets.__version__)
torchaudio._extension._init_ffmpeg()
print(torchaudio._extension._FFMPEG_INITIALIZED)
waveform, sample_rate = torchaudio.load('/workspace/.cache/huggingface/datasets/downloads/extracted/8e5aa88585efa2a4c74c6664b576550d32b7ff9c3d1d17cc04f44f11338c3dc6/cv-corpus-8.0-2022-01-19/en/clips/common_voice_en_100038.mp3', format='mp3')
print(waveform.shape)

torchaudio 0.12.1+cu116
datasets 2.3.2
True
torch.Size([1, 369792])

Aug 08 '22 18:08 fxtentacle

Related: https://github.com/huggingface/datasets/issues/4889

Aug 24 '22 16:08 patrickvonplaten

Closing as we no longer use torchaudio for decoding MP3 files.

Mar 02 '23 15:03 mariosasko

datasets datasets copied to clipboard

RuntimeError when using torchaudio 0.12.0 to load MP3 audio file

datasets
datasets copied to clipboard