audio Bark Filterbank for torchaudio

🚀 The feature

Is there any plan/interest to enable Bark spectrogram calculation in torchaudio?

Motivation, pitch

More flexibility to users of torchaudio especially for ML-DSP purposes

Alternatives

No response

Additional context

No response

Dec 26 '21 08:12 ahmed-fau

Hi @ahmed-fau

Thanks for the request. Extending torchaudio in DSP domain is generally our interest. However I am new to Bark scale. Would you recommend any learning material?

After quick googling, and reading https://www.fon.hum.uva.nl/praat/manual/BarkSpectrogram.html, it seems that the procedure looks like the following.

Waveform -> power spectrogram -> Bark scale conversion

So adding Bark Filterbank (+ optionally, BarkSpectorgram) will suffice. Is that what you had in your mind?

Dec 26 '21 13:12 mthrok

Hi @mthrok

Exactly, it's the same interface of MelSpectrogram but with a different Psychoacoustic scale (Bark instead of Mel, so adding Bark filterbank is all that we need).

The Bark scale is recently used in efficient neural speech synthesis models such as LPCNet.

For the sake of completeness, you can also add another argument for the ERB (equivalent rectangular bandwidth) scale, which is also used in recent neural speech enhancement systems such as PercepNet

Dec 26 '21 16:12 ahmed-fau

Hi @ahmed-fau

Sorry for the late response, but if you are still available, feel free to make a PR.

Feb 07 '22 14:02 mthrok

So just come across tis post - so are we talking about RASTA-related spectrogram energies since we talk about bark scaling?

Apr 07 '22 19:04 underdogliu

Hi @mthrok, it's been a while since your comment about making a PR to include the Bark Spectrogram into torchaudio, and there has not been any response. I have implemented it, may I make a PR?

Oct 31 '22 11:10 jdariasl

Added as a prototype feature in #2823 and #2843, thanks @jdariasl

Nov 14 '22 16:11 carolineechen

audio audio copied to clipboard

Bark Filterbank for torchaudio

🚀 The feature

Motivation, pitch

Alternatives

Additional context

audio
audio copied to clipboard