audio icon indicating copy to clipboard operation
audio copied to clipboard

Loading audiofiles with offset is broken since 0.10

Open RuABraun opened this issue 2 years ago • 4 comments

🚀 The feature

Describe

Had to downgrade to 0.9 so that it would work. Tried nightly and 0.10

>>> import torchaudio                                                                                                                                                                                      
>>> torchaudio.load('/path/to/wav/5ccae615b4e948578998a20f-wav.wav', frame_offset=10351280, num_frames=67232)
trim: Error parsing position 1
trim: usage: {position}                      
Traceback (most recent call last):            
  File "<stdin>", line 1, in <module>            
  File "/home/rudolf/miniconda3/envs/k2/lib/python3.8/site-packages/torchaudio-0.10.1+6f539cf-py3.8-linux-x86_64.egg/torchaudio/backend/sox_io_backend.py", line 152, in load
    return torch.ops.torchaudio.sox_io_load_audio_file(         
  File "/home/rudolf/miniconda3/envs/k2/lib/python3.8/site-packages/torch/_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: Invalid effect option: trim 10,351,280s +67,232s      

Versions

PyTorch version: 1.13.0.dev20220601+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.31

Python version: 3.8.13 (default, Mar 28 2022, 11:38:47)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-113-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 515.43.04
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] k2==1.15.1.dev20220601+cuda11.6.torch1.13.0.dev20220601
[pip3] numpy==1.22.3
[pip3] torch==1.13.0.dev20220601+cu116
[pip3] torchaudio==0.9.0a0+a85b239
[conda] k2                        1.15.1.dev20220601+cuda11.6.torch1.13.0.dev20220601          pypi_0    pypi
[conda] mkl                       2022.0.1           h06a4308_117  
[conda] mkl-include               2022.0.1           h06a4308_117  
[conda] numpy                     1.22.3           py38h7a5d4dd_0  
[conda] numpy-base                1.22.3           py38hb8be1f0_0  
[conda] torch                     1.13.0.dev20220601+cu116          pypi_0    pypi
[conda] torchaudio                0.9.0a0+a85b239          pypi_0    pypi

(as mentioned previously working version had torchaudio>=0.10)

Motivation, pitch

.

Alternatives

No response

Additional context

I could not create a bug report so I had to create a feature request. You probably want to fix that too.

RuABraun avatar Jun 02 '22 10:06 RuABraun

Hi @RuABraun

Thanks for the report, can you tell me the sample rate and the duration of your audio file?

mthrok avatar Jun 03 '22 17:06 mthrok

It works fine on my env. What seems to be peculiar to me is the fact that it is generating the effect command of trim 10,351,280s +67,232s, where it is supposed to generate trim 10351280s +67232s.

The corresponding code is here

https://github.com/pytorch/audio/blob/f0bc00c980012badea8db011f84a0e9ef33ba6c1/torchaudio/csrc/sox/io.cpp#L49-L61

which has not been changed since v0.8.

I don't have exact idea why this is happening, but is your application somehow changes configuration of C++ iostream behavior?

mthrok avatar Jun 03 '22 17:06 mthrok

What is your LC_ALL env value? What happens if it's set to LC_ALL=C?

mthrok avatar Jun 03 '22 17:06 mthrok

Hi! Some additional info

numsamples: 24,992,427
seconds: 1562.026687
fs: 16000

LC_ALL was unset, same result if I set it to C.

Interesting point about the commas. I'm not aware of anything done to iostream.. thank you for pointing out the relevant code I don't have time right now but I may try poking around in it later.

RuABraun avatar Jun 03 '22 19:06 RuABraun

Closing the issue as this is not reproducible on my end, and it does seem like a widely recognized issue. Feel free to report if anyone having the same issue.

mthrok avatar Aug 01 '23 01:08 mthrok