audio
audio copied to clipboard
torchaudio.transforms.SpectralCentroid supports only real valued input
🐛 Describe the bug
The SpectralCentroid transform supports only real valued inputs. If complex values are provided, it yields an error : RuntimeError: Cannot have onesided output if window or input is complex. I've tracked down the error and it is caused by the fact that there is no possibility to pass a "onesided" parameter to all the layers so the call to the spectrogram method takes the default parameter value, which is True, thus generating the error.
I think it could be easily fixed and I can create a PR for that.
Some code to reproduce the issue :
import torch
from torchaudio.transforms import SpectralCentroid
start = 0
end = 300 * 3.14
n_points = 8000
theta = torch.linspace(start, end, n_points)
waveform = torch.sin(theta) + 1j * torch.sin(theta)
waveform = waveform.unsqueeze(0)
n_fft = 400
sample_rate = (end - start) / n_points
spec = SpectralCentroid(sample_rate = sample_rate, n_fft=n_fft)(waveform)
Versions
PyTorch version: 1.13.1+cu117 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31
Python version: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-4.14.294-220.533.amzn2.x86_64-x86_64-with-glibc2.10 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Stepping: 7 CPU MHz: 3103.732 BogoMIPS: 4999.98 Hypervisor vendor: KVM Virtualization type: full L1d cache: 128 KiB L1i cache: 128 KiB L2 cache: 4 MiB L3 cache: 35.8 MiB NUMA node0 CPU(s): 0-7 Vulnerability Itlb multihit: KVM: Vulnerable Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Retbleed: Vulnerable Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.2 [pip3] sagemaker-pytorch-training==2.4.0 [pip3] torch==1.13.1 [pip3] torchaudio==0.13.1 [pip3] torchvision==0.11.3+cpu [conda] mkl 2020.2 256 anaconda [conda] mkl-include 2020.2 256 anaconda [conda] numpy 1.22.2 pypi_0 pypi [conda] sagemaker-pytorch-training 2.4.0 pypi_0 pypi [conda] torch 1.13.1 pypi_0 pypi [conda] torchaudio 0.13.1 pypi_0 pypi [conda] torchvision 0.11.3+cpu pypi_0 pypi
thanks @rgt-yncrea . We wonder whether does it make sense to pass waveform composed of complex values to the spectrum centroid function? What's the use case? Thanks!
The usecase is the processing of a RF radar signal made of complex values. In my case, the Spectrogram gives information about a speed that could either be positive or negative, corresponding respectively to positive/negative frequency in the spectrum. Thus the need to have a two-sided FFT
thanks @rgt-yncrea . We discussed among our team and thought it makes sense. Can you go ahead making an PR? Thanks!
@xiaohui-zhang , I'm working on it. I'm struggling with the local build of torchaudio (related to this issue). Once I manage to build it, I'll open the PR