Audio: Add audio feature extractor component MFCC
The work so far runs in testbench. It reads a wav file and outputs to raw binary file the MFCC data. A Matlab/Octave script is provided to parse the output and extract the MFCC payload from audio capture file. E.g.
cd $SOF_WORKSPACE/sof
scripts/build-tools.sh -t
scripts/rebuild-testbench.sh
cd tools/tune/mfcc/
./run_mfcc.sh /usr/share/sounds/alsa/Front_Center.wav
octave --gui &
decode_ceps('mfcc.raw',13);
The above commands create this plot:

The MFCC operation followed configuration that was defined in mfcc_setup.m. It was used to output the configuration blob. Editing it and redoing above step with test topologies would change the audio features plot appearance.
The just pushed version ran successfully in my TGL-H test device. Used topology was sof-hda-generic-2ch-mfcc.tplg. Average load was quite decent 24 MCPS for 16 kHz mono MFCC computation, FFT size 512, FFT hop 10 ms, hamming window, 23 Mel bands, 13 cepstral coefficients out. The output stream is not ALSA compress, but a fake PCM stream with magic sync word followed by 16 bit data when ceps were inserted. Otherwise zeros to maintain in sink same PCM format as in source.
This version separated matrix, window and Mel frequency functions into separate generic library functions.
This version changes FFT to a new 16 bit version. It saves a lot of RAM with minimal impact to quality with 16 bit data. Since there's a lot to review I will split the FFT change to other PR. I'm now happy with FFT quality so FFT should be ready after that PR for xtensa SIMD optimization patches.
The just pushed version fixed a build issue with functions comp_update_buffer_consume/produce() those have changed names. No other changes. Also more recent versions of FFT library and window functions library are now in their own PRs.
@ShriramShastry would you be able to review. Thanks !
@ShriramShastry would you be able to review. Thanks !
Sure, I'II review the PR. Thank you
@singalsu conflicts
Hi, Seppo
For such a big feature, if people want to know the background, do we have a design document to explain the details? if not, I would suggest add a readme to describe the whole feature, then reviewer can have better understanding on this, do you think so? like design? filter type, filter stage, Q format, etc
Thanks Tim
The reference code for this component is in #5769. The work target is a low-power SOF component that is setup parameters compatible Pytorch library Kaldi MFCC and librosa MFCC. The Matlab concept achieves with a limited set of parameters fair match with Pytorch. Librosa compatibilility need to be improved. The output stream needs to be changed to ALSA compress type. It's currently only a fake PCM stream. I hope I can make a user space demo small scale ASR with those libraries that demonstrates the FW MFCC. I will keep adding more Pytorch and Librosa like options to improve compatibility with parameters variation. See tools/tune/mfcc how to set up it via binary blob.
The just pushed version is without libraries and with minor updates. This should build OK when #6178 is merged. I will next address the review feedback for this component.
Thanks, Seppo, that's will be helpful, I roughly went through MFCC design(outside), it is complexed, since our design is low power, there should have some tradeoffs. Do we have local C environment test for ASR with MFCC? with matlab code, seems only few people know this.
If more people want to know current MFCC framework, a diagram for the whole MFCC flow is helpful, especially compared with full MFCC flow, then more people will know our design's benefit, do you think so?
Thanks Tim
@btian1 I would like to make a demo ASR (limited to e.g. numbers recognize) with e.g. python libraries, run it in e.g. UP extreme or UP2 user space with MFCC data from DSP. The MFCC flow is simple but there's complexity in details. Matlab offline processed ASR would be even quicker path but it I'd prefer a demo that could run on our test DUTs.
For now the best documents are about librosa and Pytorch: https://pytorch.org/audio/0.11.0/tutorials/audio_feature_extractions_tutorial.html
The latest push contains code rebase after split of DCT library and some cleanup. Still work to do remains for supporting component test in testbench env. Main lib functions tests are now fortunately in pretty good shape in their own PRs.
I pushed previously accidentally library patches duplicates, now fixed.
Some results for MFCC component fixed point accuracy with 1s chirp signal test:
- MFCC Matlab concept vs. Pytorch reference, ceps RMS error 1e-4
- MFCC component vs. Pytorch reference, 4.7081
- Override FFT out with concept FFT out, 1.6845
- Override MEL filterbank with concept Mel filterbank out, 0.0349
The test was done with file read code to inject data from file into algorithm to replace the first steps in computation (FFT and Mel filterbank). It seems that the largest source of error vs. floating point reference is the FFT. Mel filterbank is less noisy. And DCT seems to to contribute very little to error. The accuracy optimization effort should be put mostly to FFT.
@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?
@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?
The largest difference contribution is from 16 bit FFT and 16 bit Mel band triangles. I need to retest with with 32 bit FFT version. It's a kconfig option, so easy to switch.
I'd like to understand how much the difference in fixed point MFCC impacts speech recognition word error rate. I should be able to do such test too with not too much work.
@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?
The largest difference contribution is from 16 bit FFT and 16 bit Mel band triangles. I need to retest with with 32 bit FFT version. It's a kconfig option, so easy to switch.
I'd like to understand how much the difference in fixed point MFCC impacts speech recognition word error rate. I should be able to do such test too with not too much work.
The RMS error in chirp test drops from 4.708 to 1.771, so the 32 bit FFT improves quite a bit. Below is the error vs. Pytorch for Matlab float, 32 bit FFT version version of component, 16 bit FFT version. All use 16 bit PCM data as input.

To fix this I tried to change to src/audio/Kconfig select COMP_MODULE_ADAPTER instead of depends.
[ 96%] Linking C executable sof
/home/sof/work/xtensa-imx-elf/lib/gcc/xtensa-imx-elf/10.2.0/../../../../xtensa-imx-elf/bin/ld: CMakeFiles/sof.dir/src/audio/mfcc/mfcc.c.o: in function `sys_comp_module_mfcc_interface_init':
/home/sof/work/sof.git/src/audio/mfcc/mfcc.c:282: undefined reference to `module_adapter_new'
Edit: Seems to not help, change back to depends that other module_adapter clients use.
Changed vs. previous push, in Kconfig
depends on COMP_MODULE_ADAPTER
depends on !COMP_LEGACY_INTERFACE
that seems to avoid the imx build issue in CI.
@wszypelt @lrudyX the failed logs are timing out. Can you check. Thanks !
@lgirdwood a lot of tests are running today, but I have already added everything to the queue, so within 2-3 hours it should be ready