ML-zoo icon indicating copy to clipboard operation
ML-zoo copied to clipboard

KWS for tensorflow lite micro

Open ctwillson opened this issue 1 year ago • 5 comments

The feature extract used 'MFCC',however, It appears that TFLM (TensorFlow Lite for Microcontrollers) does not support the MFCC (Mel-Frequency Cepstral Coefficients) operator.How can I use it on TFLM?

ctwillson avatar Sep 23 '24 03:09 ctwillson

You would have to do the feature extraction within your pre-processing code, you can see an example of how to do this here: https://github.com/ARM-software/ML-examples/tree/506c941bebdeb55aedc1b8cc53f27c482cf67ec8/tflu-kws-cortex-m/kws_cortex_m/Source/MFCC or here: https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ml-embedded-evaluation-kit/+/refs/heads/main/source/application/api/use_case/kws/src/KwsProcessing.cc

Burton2000 avatar Sep 26 '24 15:09 Burton2000

You would have to do the feature extraction within your pre-processing code, you can see an example of how to do this here: https://github.com/ARM-software/ML-examples/tree/506c941bebdeb55aedc1b8cc53f27c482cf67ec8/tflu-kws-cortex-m/kws_cortex_m/Source/MFCC or here: https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ml-embedded-evaluation-kit/+/refs/heads/main/source/application/api/use_case/kws/src/KwsProcessing.cc

Thanks for your reply. BTW,Streaming processing is very important for keyword spotting,does ML-Zoo support it? As your know,for microcontrollers,the lantency is very important

ctwillson avatar Sep 27 '24 02:09 ctwillson

The ML-Zoo repository is only for providing ML models for people to use. It isn't focused on showing complete end to end embedded applications.

The links I provided above show how to use these KWS model in a streaming audio use case so should be helpful for you.

Burton2000 avatar Sep 27 '24 09:09 Burton2000

The ML-Zoo repository is only for providing ML models for people to use. It isn't focused on showing complete end to end embedded applications.

The links I provided above show how to use these KWS model in a streaming audio use case so should be helpful for you.

For model side on embedded applications,actually,we don't need send 2s(or 1s) audio frame to predict because the audio stream is sequential,we just need like 10ms and then combine this.Sorry for my poor English,but you can read this paper https://arxiv.org/abs/2005.06720. I have no idea about how to do that.Do u have any suggestion?

ctwillson avatar Sep 28 '24 10:09 ctwillson

Thanks for sending the paper link, I understand your question now.

The answer is no we do not provide any implementation of this type of streaming. We only provide a sample of doing what happen in fig 1.a, e.g. get enough audio for 1 inference, do an inference, get some more audio, do another whole inference, repeat

Burton2000 avatar Sep 30 '24 16:09 Burton2000