datasets icon indicating copy to clipboard operation
datasets copied to clipboard

RuntimeError when using torchaudio 0.12.0 to load MP3 audio file

Open albertvillanova opened this issue 2 years ago • 2 comments

Current version of torchaudio (0.12.0) raises a RuntimeError when trying to use sox_io backend but non-Python dependency sox is not installed: https://github.com/pytorch/audio/blob/2e1388401c434011e9f044b40bc8374f2ddfc414/torchaudio/backend/sox_io_backend.py#L21-L29

def _fail_load(
    filepath: str,
    frame_offset: int = 0,
    num_frames: int = -1,
    normalize: bool = True,
    channels_first: bool = True,
    format: Optional[str] = None,
) -> Tuple[torch.Tensor, int]:
    raise RuntimeError("Failed to load audio from {}".format(filepath))

Maybe we should raise a more actionable error message so that the user knows how to fix it.

UPDATE:

  • this is an incompatibility of latest torchaudio (0.12.0) and the sox backend

TODO:

  • [x] as a temporary solution, we should recommend installing torchaudio<0.12.0
    • #4777
    • #4785
  • [ ] however, a stable solution must be found for torchaudio>=0.12.0

Related to:

  • https://github.com/huggingface/transformers/issues/18379

albertvillanova avatar Aug 01 '22 14:08 albertvillanova

Requiring torchaudio<0.12.0 isn't really a viable solution because that implies torch<0.12.0 which means no sm_86 CUDA support which means no RTX 3090 support in PyTorch.

But in my case, the error only occurs if _fallback_load resolves to _fail_load inside torchaudio 0.12.0 which is only the case if FFMPEG initialization failed: https://github.com/pytorch/audio/blob/b1f510fa5681e92ee82bdc6b2d1ed896799fc32c/torchaudio/backend/sox_io_backend.py#L36-L47

That means the proper solution for torchaudio>=0.12.0 is to check torchaudio._extension._FFMPEG_INITIALIZED and if it is False, then we need to remind the user to install a dynamically linked ffmpeg 4.1.8 and then maybe call torchaudio._extension._init_ffmpeg() to force a user-visible exception showing the missing ffmpeg dynamic library name.

On my system, installing

  • libavcodec.so.58
  • libavdevice.so.58
  • libavfilter.so.7
  • libavformat.so.58
  • libavutil.so.56
  • libswresample.so.3
  • libswscale.so.5

from ffmpeg 4.1.8 made HF datasets 2.3.2 work just fine with torchaudio 0.12.1+cu116:

import sox, torchaudio, datasets
print('torchaudio', torchaudio.__version__)
print('datasets', datasets.__version__)
torchaudio._extension._init_ffmpeg()
print(torchaudio._extension._FFMPEG_INITIALIZED)
waveform, sample_rate = torchaudio.load('/workspace/.cache/huggingface/datasets/downloads/extracted/8e5aa88585efa2a4c74c6664b576550d32b7ff9c3d1d17cc04f44f11338c3dc6/cv-corpus-8.0-2022-01-19/en/clips/common_voice_en_100038.mp3', format='mp3')
print(waveform.shape)
torchaudio 0.12.1+cu116
datasets 2.3.2
True
torch.Size([1, 369792])

fxtentacle avatar Aug 08 '22 18:08 fxtentacle

Related: https://github.com/huggingface/datasets/issues/4889

patrickvonplaten avatar Aug 24 '22 16:08 patrickvonplaten

Closing as we no longer use torchaudio for decoding MP3 files.

mariosasko avatar Mar 02 '23 15:03 mariosasko