audio icon indicating copy to clipboard operation
audio copied to clipboard

Pre-emphasis and its variants?

Open underdogliu opened this issue 3 years ago • 6 comments

🚀 The feature

I would like to have or add (by myself, maybe?) pre-emphasis filtering into the audio processing step.

Motivation, pitch

As we all know, pre-emphasis boosts the amount of energy in the high frequencies, especially for voiced segments. At least for speaker verification tasks (and I believe as well as others), it is thus beneficial.

Alternatives

For furthering, there are actually some linear/nonlinear filtering/normalization operations can be integrated, most of which can be sourced from other audio toolkits like librosa. But I think we may focus on pre-emphasis in torchaudio.transforms and torchaudio.functional first.

Additional context

No response

underdogliu avatar Apr 07 '22 19:04 underdogliu

Hi @underdogliu, thanks for the suggestion! We don't have objections against supporting pre-emphasis, but were wondering if you could elaborate a bit more on what you're referring to for the variants, and if there's any existing implementation/paper/references you can link regarding this?

carolineechen avatar Apr 15 '22 22:04 carolineechen

@carolineechen Sorry for the late reply. Been bothered with many things in parallel.

So pre-emphasis is nothing but a time-domain FIR filter. By talking variants I mean there might be some other types of filter available in order to flatten the spectrum. But of course, we can just apply a minimal version. But you make the final decision.

One reference: https://mini.dcs.shef.ac.uk/wp-content/papercite-data/pdf/loweimi_nolisp13.pdf

underdogliu avatar Apr 23 '22 07:04 underdogliu

@underdogliu got it, yea I think adding standard pre-emphasis to torchaudio transforms and functional (under filtering) could be a good starting point! Is this something you're interested in working on?

also quick question, would we need to add a corresponding de-emphasis function for this to be useful, or is that not necessary or already handled by torchaudio's deemph_biquad function?

carolineechen avatar Apr 27 '22 20:04 carolineechen

Yeah if necessary I am happy to spend some time developing it while getting myself more familiar with how torchaudio works. Of course, such a first-order FIR filter at the time domain can be regarded as a special case (b_0=1, a_0=1, b_1=-alpha, other parameters are zero-valued) of the bi-quad function.

Speaking of that function, I also have a question that may be naive: when I was checking this function, I found most of the simple computations are done via math instead of torch. Is it because we are handling scalars? I am not sure about that especially when we wanna make certain parameters learnable (analogous to PCEN and learnable STFT).

underdogliu avatar Apr 29 '22 12:04 underdogliu

@underdogliu a good start might be adopting https://github.com/csteinmetz1/auraloss/blob/main/auraloss/perceptual.py#L39

faroit avatar Apr 29 '22 12:04 faroit

I hope we would be able to implement the pre-emphasis filtering with torchaudio.functional.lfilter. Can somebody pls comment on this ?

stonelazy avatar Sep 03 '22 01:09 stonelazy

addressed in #2871

carolineechen avatar Dec 06 '22 19:12 carolineechen