DeepSpeaker-pytorch icon indicating copy to clipboard operation
DeepSpeaker-pytorch copied to clipboard

Question regarding filter bank

Open evaldsurtans opened this issue 7 years ago • 2 comments

Great job on implementing paper!

Question: why did you use python_speech_features.fbank instead of librosa.feature.melspectrogram ? Both transformations are the same, right?

evaldsurtans avatar Mar 30 '18 13:03 evaldsurtans

they are almost same except the fbank discards the IDFT procedure.so it can maintain more source info of the audio.

onceforall avatar Jun 05 '18 07:06 onceforall

Thank you for answering, I suspect that you mistaken melspectrogram/filterbank for MFCC. Because as far as I understand you use IDFT only for MFCC. So python_speech_features.fbank should be the same as librosa.feature.melspectrogram right? https://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html https://librosa.github.io/librosa/generated/librosa.feature.mfcc.html

evaldsurtans avatar Jul 11 '18 10:07 evaldsurtans