convmelspec icon indicating copy to clipboard operation
convmelspec copied to clipboard

DFT and Torchaudio mode

Open hbellafkir opened this issue 1 year ago • 0 comments

Switching between torchaudio and DFT mode results in different results. To reproduce:

FFT_SIZE = 1024
SR = 16_000
HOP_SIZE = 512
MEL_BANDS = 64

x = torch.rand(1, SR)


wn = sig.windows.hann(FFT_SIZE, sym=True)

stft = Spectrogram(
    sr=SR,
    n_fft=FFT_SIZE,
    hop_size=HOP_SIZE,
    n_mel=None,
    padding=0,
    window=wn,
    spec_mode="DFT",
    dtype=torch.float32,
)


stft_ta = Spectrogram(
    sr=SR,
    n_fft=FFT_SIZE,
    hop_size=HOP_SIZE,
    n_mel=None,
    padding=0,
    window=wn,
    spec_mode="torchaudio",
    dtype=torch.float32,
)

# AssertionError
assert torch.allclose(stft(x), stft_ta(x), atol=1e-2)

Training in Torchaudio mode and exporting in DFT mode is not a good idea in this case. Am I doing something wrong?

hbellafkir avatar May 05 '24 09:05 hbellafkir