convmelspec
convmelspec copied to clipboard
DFT and Torchaudio mode
Switching between torchaudio and DFT mode results in different results. To reproduce:
FFT_SIZE = 1024
SR = 16_000
HOP_SIZE = 512
MEL_BANDS = 64
x = torch.rand(1, SR)
wn = sig.windows.hann(FFT_SIZE, sym=True)
stft = Spectrogram(
sr=SR,
n_fft=FFT_SIZE,
hop_size=HOP_SIZE,
n_mel=None,
padding=0,
window=wn,
spec_mode="DFT",
dtype=torch.float32,
)
stft_ta = Spectrogram(
sr=SR,
n_fft=FFT_SIZE,
hop_size=HOP_SIZE,
n_mel=None,
padding=0,
window=wn,
spec_mode="torchaudio",
dtype=torch.float32,
)
# AssertionError
assert torch.allclose(stft(x), stft_ta(x), atol=1e-2)
Training in Torchaudio mode and exporting in DFT mode is not a good idea in this case. Am I doing something wrong?