nnAudio
nnAudio copied to clipboard
[Feature Request] Allow STFT kernels to be normalized
I think it will be nice to have normalization tools for STFT kernels (they exist in CQT in the forward pass with the parameter normalization_type) in order to control the norm of the output.
If you want I can do a PR.
Yes please! It would be a great help. Then we would also need to add a few more test cases into the pytest file, if possible, to make sure the normalizations are correct. Currently I am using librosa as a reference, but it seems that librosa is not providing the normalization option either. Do you know other alternative references that we can compare our implementation with?
Regarding tests, the only idea I have is to test STFT of pure sinusoidad functions, which will have a maximum value of 1/2 if the kernels are normalized with the L1 norm. We may also implement the wrap style, that "wraps" positive and negative frequencies to have a maximum value of 1.
These normalization questions came up to me because when working with audio floating signals (then comprised between -1 and 1) I want the STFT (or the CQT) magnitude comprised between 0 and 1.
It may be also interesting to have L2 normalization, that ensures an interesting property ilustrated in the next formula (from Foundations of Time-Frequency Analysis by K. Gröchenig)
here, g is the window function and V_g stands for the STFT. We may also check that property in the tests, but we may have aproximation errors...
I think approximation errors should be good enough!