audio
audio copied to clipboard
Spectrogram transform with float16 precision
🚀 The feature
Looks like spectrogram transform does not work with float16 precision.
versions: python 3.8 torch 1.10.1 torchaudio 0.10.1 typing-extensions: 3.10.0.2 OS: ubuntu 20.04
My code to test if this feature works or not:
import torch
import torchaudio
batch = torch.rand(20, 1, 153600)
precision = "float16"
spec_transform = torchaudio.transforms.MelSpectrogram(
sample_rate=16000,
n_fft=1024,
win_length=600,
hop_length=320,
f_min=20,
f_max=8000,
n_mels=128,
)
batch = batch.to(getattr(torch, precision))
spec_transform = spec_transform.to(getattr(torch, precision))
spec_transform(batch)
Error:
Traceback (most recent call last):
File "test_half.py", line 20, in <module>
spec_transform(batch)
File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 587, in forward
specgram = self.spectrogram(waveform)
File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 124, in forward
return F.spectrogram(
File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 113, in spectrogram
spec_f = torch.stft(
File "/home/guillaume/.local/lib/python3.8/site-packages/torch/functional.py", line 570, in stft
input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 4179, in _pad
return torch._C._nn.reflection_pad1d(input, pad)
RuntimeError: "reflection_pad1d" not implemented for 'Half'
if GPU execution enabled:
Traceback (most recent call last):
File "test_half.py", line 20, in <module>
spec_transform(batch)
File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 587, in forward
specgram = self.spectrogram(waveform)
File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 124, in forward
return F.spectrogram(
File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 134, in spectrogram
return spec_f.abs().pow(power)
RuntimeError: "abs_cuda" not implemented for 'ComplexHalf'
Motivation, pitch
try to reduce train and inference computation cost in an audio deep learning context.
Alternatives
No response
Additional context
No response
Hi @Guillaume-oso
At the moment, torchaudio's official support are limited to fp32 and fp64. The support of fp16 is our interest but looking at the stack trace, we need to get support from PyTorch core on implementing these ops. I will ask the team and see the plausibility.
Hi @Guillaume-oso
Please refer to the latest on half complex support pytorch/pytorch#71680 and cast your voice.