audio icon indicating copy to clipboard operation
audio copied to clipboard

Spectrogram transform with float16 precision

Open Guillaume-oso opened this issue 3 years ago • 2 comments

🚀 The feature

Looks like spectrogram transform does not work with float16 precision.

versions: python 3.8 torch 1.10.1 torchaudio 0.10.1 typing-extensions: 3.10.0.2 OS: ubuntu 20.04

My code to test if this feature works or not:

import torch
import torchaudio

batch = torch.rand(20, 1, 153600)

precision = "float16"

spec_transform = torchaudio.transforms.MelSpectrogram(
    sample_rate=16000,
    n_fft=1024,
    win_length=600,
    hop_length=320,
    f_min=20,
    f_max=8000,
    n_mels=128,
)
batch = batch.to(getattr(torch, precision))
spec_transform = spec_transform.to(getattr(torch, precision))

spec_transform(batch)

Error:

Traceback (most recent call last):
  File "test_half.py", line 20, in <module>
    spec_transform(batch)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 587, in forward
    specgram = self.spectrogram(waveform)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 124, in forward
    return F.spectrogram(
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 113, in spectrogram
    spec_f = torch.stft(
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/functional.py", line 570, in stft
    input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 4179, in _pad
    return torch._C._nn.reflection_pad1d(input, pad)
RuntimeError: "reflection_pad1d" not implemented for 'Half'

if GPU execution enabled:

Traceback (most recent call last):
  File "test_half.py", line 20, in <module>
    spec_transform(batch)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 587, in forward
    specgram = self.spectrogram(waveform)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 124, in forward
    return F.spectrogram(
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 134, in spectrogram
    return spec_f.abs().pow(power)
RuntimeError: "abs_cuda" not implemented for 'ComplexHalf'

Motivation, pitch

try to reduce train and inference computation cost in an audio deep learning context.

Alternatives

No response

Additional context

No response

Guillaume-oso avatar Dec 23 '21 17:12 Guillaume-oso

Hi @Guillaume-oso

At the moment, torchaudio's official support are limited to fp32 and fp64. The support of fp16 is our interest but looking at the stack trace, we need to get support from PyTorch core on implementing these ops. I will ask the team and see the plausibility.

mthrok avatar Dec 23 '21 19:12 mthrok

Hi @Guillaume-oso

Please refer to the latest on half complex support pytorch/pytorch#71680 and cast your voice.

mthrok avatar Jan 24 '22 14:01 mthrok