datasets
datasets copied to clipboard
RuntimeError when using torchaudio 0.12.0 to load MP3 audio file
Current version of torchaudio
(0.12.0) raises a RuntimeError when trying to use sox_io
backend but non-Python dependency sox
is not installed:
https://github.com/pytorch/audio/blob/2e1388401c434011e9f044b40bc8374f2ddfc414/torchaudio/backend/sox_io_backend.py#L21-L29
def _fail_load(
filepath: str,
frame_offset: int = 0,
num_frames: int = -1,
normalize: bool = True,
channels_first: bool = True,
format: Optional[str] = None,
) -> Tuple[torch.Tensor, int]:
raise RuntimeError("Failed to load audio from {}".format(filepath))
Maybe we should raise a more actionable error message so that the user knows how to fix it.
UPDATE:
- this is an incompatibility of latest torchaudio (0.12.0) and the sox backend
TODO:
- [x] as a temporary solution, we should recommend installing torchaudio<0.12.0
- #4777
- #4785
- [ ] however, a stable solution must be found for torchaudio>=0.12.0
Related to:
- https://github.com/huggingface/transformers/issues/18379
Requiring torchaudio<0.12.0 isn't really a viable solution because that implies torch<0.12.0 which means no sm_86 CUDA support which means no RTX 3090 support in PyTorch.
But in my case, the error only occurs if _fallback_load
resolves to _fail_load
inside torchaudio 0.12.0 which is only the case if FFMPEG initialization failed: https://github.com/pytorch/audio/blob/b1f510fa5681e92ee82bdc6b2d1ed896799fc32c/torchaudio/backend/sox_io_backend.py#L36-L47
That means the proper solution for torchaudio>=0.12.0 is to check torchaudio._extension._FFMPEG_INITIALIZED
and if it is False, then we need to remind the user to install a dynamically linked ffmpeg 4.1.8 and then maybe call torchaudio._extension._init_ffmpeg()
to force a user-visible exception showing the missing ffmpeg dynamic library name.
On my system, installing
- libavcodec.so.58
- libavdevice.so.58
- libavfilter.so.7
- libavformat.so.58
- libavutil.so.56
- libswresample.so.3
- libswscale.so.5
from ffmpeg 4.1.8 made HF datasets 2.3.2 work just fine with torchaudio 0.12.1+cu116:
import sox, torchaudio, datasets
print('torchaudio', torchaudio.__version__)
print('datasets', datasets.__version__)
torchaudio._extension._init_ffmpeg()
print(torchaudio._extension._FFMPEG_INITIALIZED)
waveform, sample_rate = torchaudio.load('/workspace/.cache/huggingface/datasets/downloads/extracted/8e5aa88585efa2a4c74c6664b576550d32b7ff9c3d1d17cc04f44f11338c3dc6/cv-corpus-8.0-2022-01-19/en/clips/common_voice_en_100038.mp3', format='mp3')
print(waveform.shape)
torchaudio 0.12.1+cu116
datasets 2.3.2
True
torch.Size([1, 369792])
Related: https://github.com/huggingface/datasets/issues/4889
Closing as we no longer use torchaudio
for decoding MP3 files.