audio
audio copied to clipboard
Loading audiofiles with offset is broken since 0.10
🚀 The feature
Describe
Had to downgrade to 0.9 so that it would work. Tried nightly and 0.10
>>> import torchaudio
>>> torchaudio.load('/path/to/wav/5ccae615b4e948578998a20f-wav.wav', frame_offset=10351280, num_frames=67232)
trim: Error parsing position 1
trim: usage: {position}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rudolf/miniconda3/envs/k2/lib/python3.8/site-packages/torchaudio-0.10.1+6f539cf-py3.8-linux-x86_64.egg/torchaudio/backend/sox_io_backend.py", line 152, in load
return torch.ops.torchaudio.sox_io_load_audio_file(
File "/home/rudolf/miniconda3/envs/k2/lib/python3.8/site-packages/torch/_ops.py", line 143, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: Invalid effect option: trim 10,351,280s +67,232s
Versions
PyTorch version: 1.13.0.dev20220601+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.31
Python version: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-113-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 515.43.04
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] k2==1.15.1.dev20220601+cuda11.6.torch1.13.0.dev20220601
[pip3] numpy==1.22.3
[pip3] torch==1.13.0.dev20220601+cu116
[pip3] torchaudio==0.9.0a0+a85b239
[conda] k2 1.15.1.dev20220601+cuda11.6.torch1.13.0.dev20220601 pypi_0 pypi
[conda] mkl 2022.0.1 h06a4308_117
[conda] mkl-include 2022.0.1 h06a4308_117
[conda] numpy 1.22.3 py38h7a5d4dd_0
[conda] numpy-base 1.22.3 py38hb8be1f0_0
[conda] torch 1.13.0.dev20220601+cu116 pypi_0 pypi
[conda] torchaudio 0.9.0a0+a85b239 pypi_0 pypi
(as mentioned previously working version had torchaudio>=0.10)
Motivation, pitch
.
Alternatives
No response
Additional context
I could not create a bug report so I had to create a feature request. You probably want to fix that too.
Hi @RuABraun
Thanks for the report, can you tell me the sample rate and the duration of your audio file?
It works fine on my env. What seems to be peculiar to me is the fact that it is generating the effect command of trim 10,351,280s +67,232s
, where it is supposed to generate trim 10351280s +67232s
.
The corresponding code is here
https://github.com/pytorch/audio/blob/f0bc00c980012badea8db011f84a0e9ef33ba6c1/torchaudio/csrc/sox/io.cpp#L49-L61
which has not been changed since v0.8.
I don't have exact idea why this is happening, but is your application somehow changes configuration of C++ iostream behavior?
What is your LC_ALL
env value? What happens if it's set to LC_ALL=C
?
Hi! Some additional info
numsamples: 24,992,427
seconds: 1562.026687
fs: 16000
LC_ALL
was unset, same result if I set it to C
.
Interesting point about the commas. I'm not aware of anything done to iostream.. thank you for pointing out the relevant code I don't have time right now but I may try poking around in it later.
Closing the issue as this is not reproducible on my end, and it does seem like a widely recognized issue. Feel free to report if anyone having the same issue.