audio Loading a BytesIO opus file does not seem to work

🐛 Bug

Loading a BytesIO opus file does not seem to work.

To Reproduce

Steps to reproduce the behavior:

import torchaudio
import io

print(torchaudio.__version__)

# samples from https://docs.espressif.com/projects/esp-adf/en/latest/design-guide/audio-samples.html

torchaudio.load("ff-16b-2c-44100hz.mp3")
torchaudio.load("ff-16b-2c-44100hz.opus")

def file_like(filepath):
    return io.BytesIO(open(filepath, "rb").read())

torchaudio.load(file_like("ff-16b-2c-44100hz.mp3"), format="mp3")
# this crashes with
# formats: can't open input file `': Input not an Ogg Opus audio stream
# Traceback (most recent call last):
#   File "test.py", line 14, in <module>
#     torchaudio.load(file_like("ff-16b-2c-44100hz.opus"), format="opus")
#   File "/Users/csh/miniconda3/lib/python3.7/site-packages/torchaudio/backend/sox_io_backend.py", line 150, in load
#     filepath, frame_offset, num_frames, normalize, channels_first, format)
# RuntimeError: Error loading audio file: failed to open file <in memory buffer>
torchaudio.load(file_like("ff-16b-2c-44100hz.opus"), format="opus")

Expected behavior

It seems like it should work as well as it does for the mpi3 case

Environment

What commands did you used to install torchaudio (conda/pip/build from source)?
pip
If you are building from source, which commit is it?
What does torchaudio.__version__ print? (If applicable)
0.9.0

Please copy and paste the output from our environment collection script (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch Version (e.g., 1.0): 1.9.0
OS (e.g., Linux): mac os x
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source):
Python version: 3.7.3
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Additional context

Sep 19 '21 01:09 christopherhesse

Hi @christopherhesse

Can you try increasing the buffer size?

torchaudio.utils.sox_utils.set_buffer_size(16000)

This happens because the header size of the the OPUS file is larger than the default buffer size that torchaudio uses to read the header. OPUS format is tricky and it allows arbitral size of header. The officially recommended header size is bellow 6k, and torchaudio uses 4k for default buffer size, but this file seems to have 16k.

Sep 19 '21 02:09 mthrok

@hwangjeff I think we can issue a warning if it fails to load opus file from byte buffer.

Sep 19 '21 02:09 mthrok

Thanks @mthrok that does immediately fix the loading error, however I now get a different issue shown here:

import torchaudio
import io

torchaudio.utils.sox_utils.set_buffer_size(16000)
print(torchaudio.__version__)

# samples from https://docs.espressif.com/projects/esp-adf/en/latest/design-guide/audio-samples.html

torchaudio.load("ff-16b-2c-44100hz.mp3")
data, sample_rate = torchaudio.load("ff-16b-2c-44100hz.opus")
print(data.shape, sample_rate)

def file_like(filepath):
    return io.BytesIO(open(filepath, "rb").read())

torchaudio.load(file_like("ff-16b-2c-44100hz.mp3"), format="mp3")
data, sample_rate = torchaudio.load(file_like("ff-16b-2c-44100hz.opus"), format="opus")
print(data.shape, sample_rate)

The output of this script is:

0.9.0
torch.Size([2, 8980158]) 48000
torch.Size([2, 143688]) 48000

The odd thing is that this is the same file each time, so the data shape should be the same both times.

Sep 19 '21 02:09 christopherhesse

Thanks @christopherhesse for the report. That indeed looks strange, and I confirm that I observed the same issue on my env. I will look into it.

Sep 20 '21 18:09 mthrok

We have removed file-like object support from libsox, and now this is handled by ffmpeg backend and it seems to work fine. I will close this issue.

Jul 31 '23 16:07 mthrok

audio audio copied to clipboard

Loading a BytesIO opus file does not seem to work

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

audio
audio copied to clipboard