audio icon indicating copy to clipboard operation
audio copied to clipboard

specify fmin and fmax for Spectrogram

Open bilzard opened this issue 1 year ago • 2 comments

🚀 The feature

specify fmin and fmax for Spectrogram like MelSpectrogram.

Motivation, pitch

We can specify fmin and fmax for MelSpectrogram, but we cannot for Spectrogram. If we don't want to use frequencies out of specified frequency bands, it will spend extra memory and computation costs. Also, by this feature, we can make it consistent specifications for Spectrogram and MelSpectrogram transforms.

Alternatives

I don't know the current workaround for fulfilling:

  1. specify fmin and fmax
  2. extract linear filter banks

Additional context

No response

bilzard avatar Jan 20 '24 06:01 bilzard

I have misunderstanding on current implementation of MelSpectrogram. It is just combination of Spectrogram and MelScale transforms[1]. So, current implementation of MelSpectrogram's computational cost is just the same as Spectrogram.

Nevertheless, I still interested in if there are possibility for directly specifying fmin and fmax in Spectrogram transform. In my understanding, it is technically possible and it will reduce computation and memory cost in cases I mentioned above.

  • [1] https://pytorch.org/audio/main/generated/torchaudio.transforms.MelSpectrogram.html#torchaudio.transforms.MelSpectrogram

bilzard avatar Jan 20 '24 06:01 bilzard

I found a workaround for fmin=0 Hz.

We can simply down-sample the original sequence until it come to limit for the Nyquist frequency that corresponds with the new sampling rate. E.g., If we only want 0-20 Hz frequency band, and the original sampling frequency is 200 Hz, we can down sample original sequence for 40 Hz (1/5) and pass it to STFT.

I still be issue for fmin>0 Hz, but in my case (fmin=0 Hz), the issue is solved.

bilzard avatar Jan 20 '24 07:01 bilzard