audio
audio copied to clipboard
Reading FLAC files from file-like objects is broken?
🐛 Describe the bug
Calling torchaudio.load with a file-like object as argument to load FLAC-encoded files sometimes returns incomplete tensors. It seems to happen with non-trivial tensors after a certain length.
Minimal sample code:
Below I conduct the following test: I write a given input tensor on the filesystem using torchaudio.save in FLAC format, and then load it with torchaudio.load with a file-like object as argument. I repeat this with different tensors with different shape. The shape of the loaded tensor is written in comments; it does not always match the input tensor!
import torch
import torchaudio
fs = 16000
filename = 'example.flac'
def test(input):
# write data
torchaudio.save(filename, input, fs)
# load data that was just saved using file object
f = open(filename, 'rb')
output, _ = torchaudio.load(f)
# check the output shape
print(int(input.shape == output.shape), output.shape)
torch.manual_seed(0)
# non-random vectors as int or float work fine
test(torch.zeros(1, 1600, dtype=torch.int)) # correct
test(torch.ones(1, 1600, dtype=torch.float)) # correct
test(10*torch.ones(2, 16000, dtype=torch.int)) # correct
test(-100*torch.ones(2, 160000, dtype=torch.float)) # correct
# short random vectors also work fine (0.1s @ 16kHz)
test(torch.randn(1, 1600)) # correct
test(torch.randn(2, 1600)) # correct
test(torch.rand(1, 1600)) # correct
test(torch.rand(2, 1600)) # correct
# longer random vectors produce wrong shapes! (1s @ 16kHz)
test(torch.randn(1, 16000)) # wrong, torch.Size([1, 4096]), raises warning "flac: decoder MD5 checksum mismatch."
test(torch.randn(2, 16000)) # wrong, torch.Size([2, 0])
test(torch.rand(1, 16000)) # wrong, torch.Size([1, 8192]), raises warning "flac: decoder MD5 checksum mismatch."
test(torch.rand(2, 16000)) # wrong, torch.Size([2, 0])
# maybe staying in [-1.0, 1.0] fixes it?
test(0.01*torch.randn(1, 16000)) # correct
test(0.01*torch.randn(2, 16000)) # wrong, torch.Size([2, 0])
test(0.01*torch.rand(1, 16000)) # correct
test(0.01*torch.rand(2, 16000)) # wrong, torch.Size([2, 0])
# what about longer waveforms? (10s @ 16kHz)
test(0.01*torch.randn(1, 160000)) # wrong, torch.Size([1, 24576]), raises warning "flac: decoder MD5 checksum mismatch."
test(0.01*torch.randn(2, 160000)) # wrong, torch.Size([2, 0])
test(0.01*torch.rand(1, 160000)) # wrong, torch.Size([1, 69632]), raises warning "flac: decoder MD5 checksum mismatch."
test(0.01*torch.rand(2, 160000)) # wrong, torch.Size([2, 0])
Results:
- Using tensors full of zeros or ones, whether they are casted as
intorfloat, always produce correct shapes - Using random tensors as short as 0.1s at 16kHz works too
- Using longer random tensors of 1s or 10s at 16kHz produces wrong shapes!
- 1-channel tensors are loaded incomplete and raise a "flac: decoder MD5 checksum mismatch." warning
- 2-channel tensors are always returned empty!
- Scaling down the single-channel random tensors fixed the issue for 1s-long tensors...
- ...but increasing the duration to 10s breaks it again!
Notes:
- Writing/loading to WAV, OGG or VORBIS by changing the extension e.g. setting
filename = example.wavalways produces correct shapes - Writing/loading from filename instead of file-like object produces correct shapes
- [Maybe a different bug] Writing/loading to MP3 "kind of" works but one needs to provide
format="mp3"intorchaudio.load, as if the format was not inferred by the file header. I say "kind of" because the output shapes are actually exactly 704 samples longer than the input shapes! When omittingformat="mp3", the following error is raised:
formats: can't determine type of file `'
Traceback (most recent call last):
File "/home/phigon/dev/brever/temp2.py", line 24, in <module>
test(torch.zeros(1, 1600, dtype=torch.int)) # correct
File "/home/phigon/dev/brever/temp2.py", line 15, in test
output, _ = torchaudio.load(f)
File "/home/phigon/dev/brever/venv/lib64/python3.10/site-packages/torchaudio/backend/sox_io_backend.py", line 149, in load
return torchaudio._torchaudio.load_audio_fileobj(
RuntimeError: Error loading audio file: failed to open file <in memory buffer>
Some more testing
Since in the code sample above I am writing the FLAC files with torchaudio, I thought maybe the issue comes from the writing and not the reading. But when trying to read from files created differently, problems arise too!
import subprocess
import numpy as np
import soundfile as sf
import torchaudio
fs = 16000
filename = 'example.flac'
np.random.seed(0)
# TEST 1: write with soundfile
x = np.random.randn(16000, 2) # in soundfile shape is (samples, channels)
sf.write(filename, x, fs)
f = open(filename, "rb")
x, _ = torchaudio.load(f)
print(x.shape)
# good shape with libsndfile 1.0.25
# bad shape with libsndfile 1.0.31, got torch.Size([2, 0])
# TEST 2: write with soundfile, different tensor
x = np.random.rand(160000, 1)
sf.write(filename, x, fs)
f = open(filename, "rb")
x, _ = torchaudio.load(f)
print(x.shape)
# bad with libsndfile 1.0.25, got torch.Size([1, 40320])
# good with libsndfile 1.0.31
# TEST 3: write with ffmpeg
x = np.random.rand(16000, 1)
tmpfile = 'example.wav'
sf.write(tmpfile, x, fs) # first write a temporary WAV file
subprocess.call([ # convert temporary file to FLAC with FFmpeg
'ffmpeg',
'-y',
'-hide_banner',
'-loglevel',
'error',
'-i',
tmpfile,
filename,
])
f = open(filename, "rb")
x, _ = torchaudio.load(f)
# raises RuntimeError, does not fix with format="flac"
Above I perform the following tests:
- Write with
soundfilea 1s-long 2-channel tensor of normally distributed samples- Output shape is correct with libsndfile 1.0.25, but wrong with libsndfile 1.0.31!
- Write with
soundfilea 10s-long 1-channel tensor of uniformly distributed samples- Output shape is correct with libsndfile 1.0.31, but wrong with libsndfile 1.0.25!
- Write with FFmpeg
- Can't even load, raises a
RuntimeError:
- Can't even load, raises a
formats: can't open input file `': FLAC ERROR whilst decoding metadata
Traceback (most recent call last):
File "/home/phigon/dev/brever/temp3.py", line 47, in <module>
x, _ = torchaudio.load(f)
File "/home/phigon/dev/brever/venv/lib64/python3.10/site-packages/torchaudio/backend/sox_io_backend.py", line 149, in load
return torchaudio._torchaudio.load_audio_fileobj(
RuntimeError: Error loading audio file: failed to open file <in memory buffer>
Here again, when calling torchaudio.load with the filename instead of the file-like object as argument, everything works fine.
FFmpeg version: 4.4.2 SoundFile version: 0.10.3.post1
Versions
Collecting environment information... PyTorch version: 1.11.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A
OS: Fedora release 35 (Thirty Five) (x86_64) GCC version: (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9) Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.34
Python version: 3.10.4 (main, Mar 25 2022, 00:00:00) [GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] (64-bit runtime) Python platform: Linux-5.17.4-200.fc35.x86_64-x86_64-with-glibc2.34 Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] numpy==1.22.3 [pip3] torch==1.11.0 [pip3] torchaudio==0.11.0 [pip3] torchvision==0.12.0 [conda] Could not collect
Hi @philgzl
Thanks for the detailed report. I will take a look into it but I have limited bandwidth right now. Feel free to ping for update/nudge me.
Hi @mthrok, I was wondering if there has been any update on this issue.
Update
After some looking around, the issue is caused by the sox_io backend. Setting torch audio backend to soundfile seems to have fixed the issue. leading me to believe that the fallback options do not catch the issue mentioned above.
As mentioned above, for a temporary patch for the bug you can set the backend to soundfile as follow:
torchaudio.set_audio_backend('soundfile')
Hi @philgzl
Sorry for the silence. I have not located why it's broken but I confirmed that new FFmpeg-based file-like object works. Libsox's file-like object support is based on hack, so I think the way forward is to switch to FFmpeg-based decoding for all file-like object support.
We have FFmpeg backend which works fine with FLAC. So please use it. Thank, ref #2662