audio StreamReader Failed to open the input io.BytesIO

🐛 Describe the bug

StreamReader Failed to open the input io.BytesIO which is save by torchaudio.save

import io
import torchaudio
from torchaudio.io import StreamReader

wav_file = "demo.wav"
streamer = StreamReader(wav_file)  # works fine

with open(wav_file, 'rb') as f:
    streamer = StreamReader(f)  # works fine

waveform, sample_rate = torchaudio.load(wav_file)
buffer_ = io.BytesIO()
torchaudio.save(buffer_, waveform, sample_rate, format="wav")
streamer = StreamReader(buffer_) # error

Traceback (most recent call last):
  File "/home/jackie/code/temp.py", line 14, in <module>
    streamer = StreamReader(buffer_)
  File "/home/jackie/anaconda3/lib/python3.9/site-packages/torchaudio/io/_stream_reader.py", line 351, in __init__
    self._be = torchaudio._torchaudio_ffmpeg.StreamReaderFileObj(src, format, option, buffer_size)
RuntimeError: Failed to open the input "<_io.BytesIO object at 0x7fd22a59ff90>" (Invalid data found when processing input).

Versions

PyTorch version: 1.12.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9.12 (main, Apr  5 2022, 06:56:58)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.6
[pip3] numpydoc==1.2
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl  
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py39h7f8727e_0  
[conda] mkl_fft                   1.3.1            py39hd3c417c_0  
[conda] mkl_random                1.2.2            py39h51133e4_0  
[conda] numpy                     1.21.6                   pypi_0    pypi
[conda] numpydoc                  1.2                pyhd3eb1b0_0  
[conda] torch                     1.12.1                   pypi_0    pypi
[conda] torchaudio                0.12.1                   pypi_0    pypi
[conda] torchvision               0.13.1                   pypi_0    pypi

Sep 16 '22 09:09 Jackiexiao

Hi

The buffer needs to be rewound before reading it again.

buffer_.seek(0)

Sep 16 '22 11:09 mthrok

thx, my mistake

Sep 16 '22 15:09 Jackiexiao

@mthrok thx for your help, could we change buffer_(BytesIO) after initialize streamReader? it would be useful to send streaming tensor audio to ffmepg, but I get error RuntimeError: Failed to process a packet. (Invalid argument). and I have no idea to solve it.

import io
import torchaudio
from torchaudio.io import StreamReader

wav_file = "demo.wav" # 16khz audio
waveform, sample_rate = torchaudio.load(wav_file)
buffer_ = io.BytesIO()

# this works perfectly
# buffer_.write(waveform.numpy().tobytes())
# buffer_.seek(0)
streamer = StreamReader(
    buffer_,
    format='f32le',
    option={"sample_rate": "16000"},
)
# error: RuntimeError: Failed to process a packet. (Invalid argument). 
buffer_.write(waveform.numpy().tobytes())
buffer_.seek(0)

streamer.add_audio_stream(
    frames_per_chunk=2560,
    filter_desc=f"aresample=8000,aformat=sample_fmts=fltp",
)

for multi_streams_chunk in streamer.stream():
    print(multi_streams_chunk[0].shape)

Sep 20 '22 09:09 Jackiexiao

When a source object is passed to StreamReader, the StreamReader parses the metadata, so it needs at minimum header data to be available. How it parses the metadata depend on the media format. In your case, you are passing format='f32le', which is headless, so in theory passing empty BytesIO could work, but it is also possible that the parsing mechanism still attempts to read some data. (this is the detail abstracted away even in FFmpeg codebase, so I cannot say things for sure.)

You can investigate how StreamReader is using the file-like object by wrapping the instance with dummy class which logs the call to read and seek.

class Wrapper:
    def __init__(self, buffer):
        self.buffer = buffer

    def read(self, n):
        print(f"read: {n} bytes")
        return self.buffer.read(n)

    def seek(self, offset, whence):
        print(f"seek: {offset}, {whence}")
        return self.buffer.seek(offset, whence)

If read and seek is happening in the constructor of StreamReader, then there are some data buffered so that BytesIO buffer don't get to EOF.

Sep 20 '22 10:09 mthrok

yes, read and seek is happening in the constructor of StreamReader

read: 4096 bytes
read: 4096 bytes
read: 4096 bytes
read: 4096 bytes
seek: -1, 2
seek: 0, 0
seek: -1, 2
seek: 0, 0
seek: -1, 2
seek: 0, 0

Sep 20 '22 10:09 Jackiexiao

it looks like we can't add more data to BytesIO after initialize StreamReader

import io
import torchaudio
from torchaudio.io import StreamReader

wav_file = "demo.wav" # 16khz audio
waveform, sample_rate = torchaudio.load(wav_file)
buffer_ = io.BytesIO()

num = 8000
buffer_.write(waveform[:num].numpy().tobytes())
buffer_.seek(0)
streamer = StreamReader(
    buffer_,
    format='f32le',
    option={"sample_rate": "16000"},
)
position = buffer_tell() # which is 8000, since constructor of StreamReader do `read` and `seak`
buffer_.write(waveform[8000:].numpy().tobytes())

streamer.add_audio_stream(
    frames_per_chunk=2560,
    filter_desc=f"aresample=8000,aformat=sample_fmts=fltp",
)

for multi_streams_chunk in streamer.stream():
    print(multi_streams_chunk[0].shape)

still, RuntimeError: Failed to process a packet. (Invalid argument).

Sep 20 '22 11:09 Jackiexiao

The issue with the approach is that StreamReader detects EOF during the construction. Since the plain io.BytesIO object has seek method, StreamReader uses it to detect the EOF and mark it. You write to buffer object and extend the content after that, but StreamReader has already seen the original EOF, any data found after that is illegal.

To feed data interactively, you need to write custom file object class with read method, and fill the buffer in the read method as necessary, and make sure that read method does not return an empty byte string until you wan to finish processing.

Sep 20 '22 13:09 mthrok

For example; this will generate infinite noise. Similarly you should perform buffer re-filling in read method. Also note the lack of seek method, so that the behavior is true streaming fashion. No peeking ahead, no rewinding. See https://pytorch.org/audio/stable/tutorials/streaming_api_tutorial.html#file-like-objects

import torch
from torchaudio.io import StreamReader


class InfiniteNoise:
    def __init__(self):
        pass

    def read(self, n):
        print(f"read: {n} bytes")
        return torch.randn((n // 4, )).numpy().tobytes()


buf = InfiniteNoise()

streamer = StreamReader(
    buf,
    format='f32le',
    option={"sample_rate": "16000"},
)

streamer.add_audio_stream(
    frames_per_chunk=2560,
    filter_desc="aresample=8000,aformat=sample_fmts=fltp",
)

for i, (chunk, ) in enumerate(streamer.stream()):
    print(i, chunk.shape)

    if i > 5:
        break

Sep 20 '22 13:09 mthrok

audio audio copied to clipboard

StreamReader Failed to open the input io.BytesIO

🐛 Describe the bug

Versions

audio
audio copied to clipboard