audio
audio copied to clipboard
Bark Filterbank for torchaudio
🚀 The feature
Is there any plan/interest to enable Bark spectrogram calculation in torchaudio?
Motivation, pitch
More flexibility to users of torchaudio especially for ML-DSP purposes
Alternatives
No response
Additional context
No response
Hi @ahmed-fau
Thanks for the request. Extending torchaudio in DSP domain is generally our interest. However I am new to Bark scale. Would you recommend any learning material?
After quick googling, and reading https://www.fon.hum.uva.nl/praat/manual/BarkSpectrogram.html, it seems that the procedure looks like the following.
Waveform -> power spectrogram -> Bark scale conversion
So adding Bark Filterbank (+ optionally, BarkSpectorgram) will suffice. Is that what you had in your mind?
Hi @mthrok
Exactly, it's the same interface of MelSpectrogram but with a different Psychoacoustic scale (Bark instead of Mel, so adding Bark filterbank is all that we need).
The Bark scale is recently used in efficient neural speech synthesis models such as LPCNet.
For the sake of completeness, you can also add another argument for the ERB (equivalent rectangular bandwidth) scale, which is also used in recent neural speech enhancement systems such as PercepNet
Hi @ahmed-fau
Sorry for the late response, but if you are still available, feel free to make a PR.
So just come across tis post - so are we talking about RASTA-related spectrogram energies since we talk about bark scaling?
Hi @mthrok, it's been a while since your comment about making a PR to include the Bark Spectrogram into torchaudio, and there has not been any response. I have implemented it, may I make a PR?
Added as a prototype feature in #2823 and #2843, thanks @jdariasl