audio
audio copied to clipboard
specify fmin and fmax for Spectrogram
🚀 The feature
specify fmin and fmax for Spectrogram like MelSpectrogram.
Motivation, pitch
We can specify fmin and fmax for MelSpectrogram, but we cannot for Spectrogram.
If we don't want to use frequencies out of specified frequency bands, it will spend extra memory and computation costs.
Also, by this feature, we can make it consistent specifications for Spectrogram and MelSpectrogram transforms.
Alternatives
I don't know the current workaround for fulfilling:
- specify
fminandfmax - extract linear filter banks
Additional context
No response
I have misunderstanding on current implementation of MelSpectrogram.
It is just combination of Spectrogram and MelScale transforms[1].
So, current implementation of MelSpectrogram's computational cost is just the same as Spectrogram.
Nevertheless, I still interested in if there are possibility for directly specifying fmin and fmax in Spectrogram transform.
In my understanding, it is technically possible and it will reduce computation and memory cost in cases I mentioned above.
- [1] https://pytorch.org/audio/main/generated/torchaudio.transforms.MelSpectrogram.html#torchaudio.transforms.MelSpectrogram
I found a workaround for fmin=0 Hz.
We can simply down-sample the original sequence until it come to limit for the Nyquist frequency that corresponds with the new sampling rate. E.g., If we only want 0-20 Hz frequency band, and the original sampling frequency is 200 Hz, we can down sample original sequence for 40 Hz (1/5) and pass it to STFT.
I still be issue for fmin>0 Hz, but in my case (fmin=0 Hz), the issue is solved.