ML-KWS-for-MCU Training the model with GFCCs

Hello Everyone, Is anyone of you used Gammatone filter banks instead of LFBEs and MFCCs to train your Model ? what is the SNR (signal to noise ratio) of these models ? Thank you!!

May 15 '19 11:05 saichand07

I tested GFCCs in training and also on microcontroller. Worked better than MFCCs, but my current implementation also requires more resources. I don't quite understand what you mean by SNR of a model.

I'm currently working on improving the prototype version of GFCC extraction and I'm at the moment testing it on my custom data set. It's quite raw and dirty at the moment, but maybe you will find something useful: https://github.com/tpeet/ML-KWS-for-MCU.

Things I've learned:

GFCCs first coefficients are much larger than others and not centered around zero. This looses a lot of information in quantization. My solution would be to find min/max values over training set for each coefficient before training and then using these values to individually scale each coefficient between -1 and 1, to improve accuracy of quantized model.
Gammatone filters are continous and cover the whole spectrum, therefore it uses more memory and requires more computations than triangular MFCC filters. My solution was to find filter values and DCT matrix in Python and save it to C++ header file, so it could be loaded as constant variable to FLASH memory.
For my solution, 5 DCT coefficients was enough and therefore I used only 10 filterbank filters, which helped to save a lot of computations

May 19 '19 17:05 tpeet

@tpeet Thank you very much, really helpful I trained my models with LFBEs, which are giving better results than MFCCs in python. I haven't deployed yet on microcontroller

May 19 '19 17:05 saichand07

@tpeet Have you tested or checked the what is the Word Error Rate (WER) and False word detection rate of your models on board.

May 19 '19 17:05 saichand07

@saichand07 , haven't tested on KWS task, I used my own bird sounds dataset. It's very hard to get WER, as it depends on your embedded system, surrounding noise, how far you are from microphone etc...

But maybe my research gives you some ideas, how it can impact the accuracy. I played some audio clips from the speakers and recorded them through the embedded system, generating more realistic validation and testing set. When testing on original audio recordings, test accuracy was over 90%, when testing on the recordings from embedded system, the accuracy dropped to 70% for MFCCs. Therefore, I also recorded background noise this manner, which was added to training samples. This helped to improve accuracy from 70% to 75% in case of MFCCs. With GFCCs I got around 79%, even without the added background noise.

May 19 '19 18:05 tpeet

@tpeet Thank you very much, yes, I read from some literature on GFCCs, which are performing very well in presence of background noise when compared it to MFCCs. (https://pdfs.semanticscholar.org/6b5d/1d2767fbd0ce9670bf334b3fd73a4cbb3a33.pdf )

May 20 '19 08:05 saichand07

ML-KWS-for-MCU ML-KWS-for-MCU copied to clipboard

Training the model with GFCCs

ML-KWS-for-MCU
ML-KWS-for-MCU copied to clipboard