audio
audio copied to clipboard
StreamReader Failed to open the input io.BytesIO
🐛 Describe the bug
StreamReader Failed to open the input io.BytesIO which is save by torchaudio.save
import io
import torchaudio
from torchaudio.io import StreamReader
wav_file = "demo.wav"
streamer = StreamReader(wav_file) # works fine
with open(wav_file, 'rb') as f:
streamer = StreamReader(f) # works fine
waveform, sample_rate = torchaudio.load(wav_file)
buffer_ = io.BytesIO()
torchaudio.save(buffer_, waveform, sample_rate, format="wav")
streamer = StreamReader(buffer_) # error
Traceback (most recent call last):
File "/home/jackie/code/temp.py", line 14, in <module>
streamer = StreamReader(buffer_)
File "/home/jackie/anaconda3/lib/python3.9/site-packages/torchaudio/io/_stream_reader.py", line 351, in __init__
self._be = torchaudio._torchaudio_ffmpeg.StreamReaderFileObj(src, format, option, buffer_size)
RuntimeError: Failed to open the input "<_io.BytesIO object at 0x7fd22a59ff90>" (Invalid data found when processing input).
Versions
PyTorch version: 1.12.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.6
[pip3] numpydoc==1.2
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[pip3] torchvision==0.13.1
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] numpy 1.21.6 pypi_0 pypi
[conda] numpydoc 1.2 pyhd3eb1b0_0
[conda] torch 1.12.1 pypi_0 pypi
[conda] torchaudio 0.12.1 pypi_0 pypi
[conda] torchvision 0.13.1 pypi_0 pypi
Hi
The buffer needs to be rewound before reading it again.
buffer_.seek(0)
thx, my mistake
@mthrok thx for your help, could we change buffer_(BytesIO) after initialize streamReader? it would be useful to send streaming tensor audio to ffmepg, but I get error RuntimeError: Failed to process a packet. (Invalid argument). and I have no idea to solve it.
import io
import torchaudio
from torchaudio.io import StreamReader
wav_file = "demo.wav" # 16khz audio
waveform, sample_rate = torchaudio.load(wav_file)
buffer_ = io.BytesIO()
# this works perfectly
# buffer_.write(waveform.numpy().tobytes())
# buffer_.seek(0)
streamer = StreamReader(
buffer_,
format='f32le',
option={"sample_rate": "16000"},
)
# error: RuntimeError: Failed to process a packet. (Invalid argument).
buffer_.write(waveform.numpy().tobytes())
buffer_.seek(0)
streamer.add_audio_stream(
frames_per_chunk=2560,
filter_desc=f"aresample=8000,aformat=sample_fmts=fltp",
)
for multi_streams_chunk in streamer.stream():
print(multi_streams_chunk[0].shape)
When a source object is passed to StreamReader, the StreamReader parses the metadata, so it needs at minimum header data to be available. How it parses the metadata depend on the media format. In your case, you are passing format='f32le', which is headless, so in theory passing empty BytesIO could work, but it is also possible that the parsing mechanism still attempts to read some data. (this is the detail abstracted away even in FFmpeg codebase, so I cannot say things for sure.)
You can investigate how StreamReader is using the file-like object by wrapping the instance with dummy class which logs the call to read and seek.
class Wrapper:
def __init__(self, buffer):
self.buffer = buffer
def read(self, n):
print(f"read: {n} bytes")
return self.buffer.read(n)
def seek(self, offset, whence):
print(f"seek: {offset}, {whence}")
return self.buffer.seek(offset, whence)
If read and seek is happening in the constructor of StreamReader, then there are some data buffered so that BytesIO buffer don't get to EOF.
yes, read and seek is happening in the constructor of StreamReader
read: 4096 bytes
read: 4096 bytes
read: 4096 bytes
read: 4096 bytes
seek: -1, 2
seek: 0, 0
seek: -1, 2
seek: 0, 0
seek: -1, 2
seek: 0, 0
it looks like we can't add more data to BytesIO after initialize StreamReader
import io
import torchaudio
from torchaudio.io import StreamReader
wav_file = "demo.wav" # 16khz audio
waveform, sample_rate = torchaudio.load(wav_file)
buffer_ = io.BytesIO()
num = 8000
buffer_.write(waveform[:num].numpy().tobytes())
buffer_.seek(0)
streamer = StreamReader(
buffer_,
format='f32le',
option={"sample_rate": "16000"},
)
position = buffer_tell() # which is 8000, since constructor of StreamReader do `read` and `seak`
buffer_.write(waveform[8000:].numpy().tobytes())
streamer.add_audio_stream(
frames_per_chunk=2560,
filter_desc=f"aresample=8000,aformat=sample_fmts=fltp",
)
for multi_streams_chunk in streamer.stream():
print(multi_streams_chunk[0].shape)
still, RuntimeError: Failed to process a packet. (Invalid argument).
The issue with the approach is that StreamReader detects EOF during the construction. Since the plain io.BytesIO object has seek method, StreamReader uses it to detect the EOF and mark it. You write to buffer object and extend the content after that, but StreamReader has already seen the original EOF, any data found after that is illegal.
To feed data interactively, you need to write custom file object class with read method, and fill the buffer in the read method as necessary, and make sure that read method does not return an empty byte string until you wan to finish processing.
For example; this will generate infinite noise.
Similarly you should perform buffer re-filling in read method.
Also note the lack of seek method, so that the behavior is true streaming fashion. No peeking ahead, no rewinding.
See https://pytorch.org/audio/stable/tutorials/streaming_api_tutorial.html#file-like-objects
import torch
from torchaudio.io import StreamReader
class InfiniteNoise:
def __init__(self):
pass
def read(self, n):
print(f"read: {n} bytes")
return torch.randn((n // 4, )).numpy().tobytes()
buf = InfiniteNoise()
streamer = StreamReader(
buf,
format='f32le',
option={"sample_rate": "16000"},
)
streamer.add_audio_stream(
frames_per_chunk=2560,
filter_desc="aresample=8000,aformat=sample_fmts=fltp",
)
for i, (chunk, ) in enumerate(streamer.stream()):
print(i, chunk.shape)
if i > 5:
break