🐛 Describe the bug

Calling torchaudio.load with a file-like object as argument to load FLAC-encoded files sometimes returns incomplete tensors. It seems to happen with non-trivial tensors after a certain length.

Minimal sample code:

Below I conduct the following test: I write a given input tensor on the filesystem using torchaudio.save in FLAC format, and then load it with torchaudio.load with a file-like object as argument. I repeat this with different tensors with different shape. The shape of the loaded tensor is written in comments; it does not always match the input tensor!

import torch
import torchaudio


fs = 16000
filename = 'example.flac'


def test(input):
    # write data
    torchaudio.save(filename, input, fs)

    # load data that was just saved using file object
    f = open(filename, 'rb')
    output, _ = torchaudio.load(f)

    # check the output shape
    print(int(input.shape == output.shape), output.shape)


torch.manual_seed(0)

# non-random vectors as int or float work fine
test(torch.zeros(1, 1600, dtype=torch.int))          # correct
test(torch.ones(1, 1600, dtype=torch.float))         # correct
test(10*torch.ones(2, 16000, dtype=torch.int))       # correct
test(-100*torch.ones(2, 160000, dtype=torch.float))  # correct

# short random vectors also work fine (0.1s @ 16kHz)
test(torch.randn(1, 1600))  # correct
test(torch.randn(2, 1600))  # correct
test(torch.rand(1, 1600))   # correct
test(torch.rand(2, 1600))   # correct

# longer random vectors produce wrong shapes! (1s @ 16kHz)
test(torch.randn(1, 16000))  # wrong, torch.Size([1, 4096]), raises warning "flac: decoder MD5 checksum mismatch."
test(torch.randn(2, 16000))  # wrong, torch.Size([2, 0])
test(torch.rand(1, 16000))   # wrong, torch.Size([1, 8192]), raises warning "flac: decoder MD5 checksum mismatch."
test(torch.rand(2, 16000))   # wrong, torch.Size([2, 0])

# maybe staying in [-1.0, 1.0] fixes it?
test(0.01*torch.randn(1, 16000))  # correct
test(0.01*torch.randn(2, 16000))  # wrong, torch.Size([2, 0])
test(0.01*torch.rand(1, 16000))   # correct
test(0.01*torch.rand(2, 16000))   # wrong, torch.Size([2, 0])

# what about longer waveforms? (10s @ 16kHz)
test(0.01*torch.randn(1, 160000))  # wrong, torch.Size([1, 24576]), raises warning "flac: decoder MD5 checksum mismatch."
test(0.01*torch.randn(2, 160000))  # wrong, torch.Size([2, 0])
test(0.01*torch.rand(1, 160000))   # wrong, torch.Size([1, 69632]), raises warning "flac: decoder MD5 checksum mismatch."
test(0.01*torch.rand(2, 160000))   # wrong, torch.Size([2, 0])

Results:

Using tensors full of zeros or ones, whether they are casted as int or float, always produce correct shapes
Using random tensors as short as 0.1s at 16kHz works too
Using longer random tensors of 1s or 10s at 16kHz produces wrong shapes!
- 1-channel tensors are loaded incomplete and raise a "flac: decoder MD5 checksum mismatch." warning
- 2-channel tensors are always returned empty!
- Scaling down the single-channel random tensors fixed the issue for 1s-long tensors...
- ...but increasing the duration to 10s breaks it again!

Notes:

Writing/loading to WAV, OGG or VORBIS by changing the extension e.g. setting filename = example.wav always produces correct shapes
Writing/loading from filename instead of file-like object produces correct shapes
[Maybe a different bug] Writing/loading to MP3 "kind of" works but one needs to provide format="mp3" in torchaudio.load, as if the format was not inferred by the file header. I say "kind of" because the output shapes are actually exactly 704 samples longer than the input shapes! When omitting format="mp3", the following error is raised:

formats: can't determine type of file `'
Traceback (most recent call last):
  File "/home/phigon/dev/brever/temp2.py", line 24, in <module>
    test(torch.zeros(1, 1600, dtype=torch.int))                # correct
  File "/home/phigon/dev/brever/temp2.py", line 15, in test
    output, _ = torchaudio.load(f)
  File "/home/phigon/dev/brever/venv/lib64/python3.10/site-packages/torchaudio/backend/sox_io_backend.py", line 149, in load
    return torchaudio._torchaudio.load_audio_fileobj(
RuntimeError: Error loading audio file: failed to open file <in memory buffer>

Some more testing

Since in the code sample above I am writing the FLAC files with torchaudio, I thought maybe the issue comes from the writing and not the reading. But when trying to read from files created differently, problems arise too!

import subprocess

import numpy as np
import soundfile as sf
import torchaudio


fs = 16000
filename = 'example.flac'


np.random.seed(0)

# TEST 1: write with soundfile
x = np.random.randn(16000, 2)  # in soundfile shape is (samples, channels)
sf.write(filename, x, fs)
f = open(filename, "rb")
x, _ = torchaudio.load(f)
print(x.shape)
# good shape with libsndfile 1.0.25
# bad  shape with libsndfile 1.0.31, got torch.Size([2, 0])


# TEST 2: write with soundfile, different tensor
x = np.random.rand(160000, 1)
sf.write(filename, x, fs)
f = open(filename, "rb")
x, _ = torchaudio.load(f)
print(x.shape)
# bad  with libsndfile 1.0.25, got torch.Size([1, 40320])
# good with libsndfile 1.0.31


# TEST 3: write with ffmpeg
x = np.random.rand(16000, 1)
tmpfile = 'example.wav'
sf.write(tmpfile, x, fs)  # first write a temporary WAV file
subprocess.call([  # convert temporary file to FLAC with FFmpeg
    'ffmpeg',
    '-y',
    '-hide_banner',
    '-loglevel',
    'error',
    '-i',
    tmpfile,
    filename,
])
f = open(filename, "rb")
x, _ = torchaudio.load(f)
# raises RuntimeError, does not fix with format="flac"

Above I perform the following tests:

Write with soundfile a 1s-long 2-channel tensor of normally distributed samples
- Output shape is correct with libsndfile 1.0.25, but wrong with libsndfile 1.0.31!
Write with soundfile a 10s-long 1-channel tensor of uniformly distributed samples
- Output shape is correct with libsndfile 1.0.31, but wrong with libsndfile 1.0.25!
Write with FFmpeg
- Can't even load, raises a RuntimeError:

formats: can't open input file `': FLAC ERROR whilst decoding metadata
Traceback (most recent call last):
  File "/home/phigon/dev/brever/temp3.py", line 47, in <module>
    x, _ = torchaudio.load(f)
  File "/home/phigon/dev/brever/venv/lib64/python3.10/site-packages/torchaudio/backend/sox_io_backend.py", line 149, in load
    return torchaudio._torchaudio.load_audio_fileobj(
RuntimeError: Error loading audio file: failed to open file <in memory buffer>

Here again, when calling torchaudio.load with the filename instead of the file-like object as argument, everything works fine.

FFmpeg version: 4.4.2 SoundFile version: 0.10.3.post1

Versions

Collecting environment information... PyTorch version: 1.11.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Fedora release 35 (Thirty Five) (x86_64) GCC version: (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9) Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.34

Python version: 3.10.4 (main, Mar 25 2022, 00:00:00) [GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] (64-bit runtime) Python platform: Linux-5.17.4-200.fc35.x86_64-x86_64-with-glibc2.34 Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.22.3 [pip3] torch==1.11.0 [pip3] torchaudio==0.11.0 [pip3] torchvision==0.12.0 [conda] Could not collect

Apr 28 '22 10:04 philgzl

Hi @philgzl

Thanks for the detailed report. I will take a look into it but I have limited bandwidth right now. Feel free to ping for update/nudge me.

May 05 '22 16:05 mthrok

Hi @mthrok, I was wondering if there has been any update on this issue.

Update

After some looking around, the issue is caused by the sox_io backend. Setting torch audio backend to soundfile seems to have fixed the issue. leading me to believe that the fallback options do not catch the issue mentioned above.

As mentioned above, for a temporary patch for the bug you can set the backend to soundfile as follow:

torchaudio.set_audio_backend('soundfile')

Sep 04 '22 11:09 knoriy

Hi @philgzl

Sorry for the silence. I have not located why it's broken but I confirmed that new FFmpeg-based file-like object works. Libsox's file-like object support is based on hack, so I think the way forward is to switch to FFmpeg-based decoding for all file-like object support.

Sep 09 '22 02:09 mthrok

We have FFmpeg backend which works fine with FLAC. So please use it. Thank, ref #2662

Jul 31 '23 16:07 mthrok

audio
audio copied to clipboard

Reading FLAC files from file-like objects is broken?

🐛 Describe the bug

Minimal sample code:

Some more testing

Versions

Update

audio audio copied to clipboard

Reading FLAC files from file-like objects is broken?

🐛 Describe the bug

Minimal sample code:

Some more testing

Versions

Update

audio
audio copied to clipboard