sof Audio: Add audio feature extractor component MFCC

Jun 29 '22 16:06 singalsu

The work so far runs in testbench. It reads a wav file and outputs to raw binary file the MFCC data. A Matlab/Octave script is provided to parse the output and extract the MFCC payload from audio capture file. E.g.

cd $SOF_WORKSPACE/sof
scripts/build-tools.sh -t
scripts/rebuild-testbench.sh 
cd tools/tune/mfcc/
./run_mfcc.sh /usr/share/sounds/alsa/Front_Center.wav 
octave --gui &
decode_ceps('mfcc.raw',13);

The above commands create this plot:

Screenshot from 2022-06-29 19-20-26

The MFCC operation followed configuration that was defined in mfcc_setup.m. It was used to output the configuration blob. Editing it and redoing above step with test topologies would change the audio features plot appearance.

Jun 29 '22 16:06 singalsu

The just pushed version ran successfully in my TGL-H test device. Used topology was sof-hda-generic-2ch-mfcc.tplg. Average load was quite decent 24 MCPS for 16 kHz mono MFCC computation, FFT size 512, FFT hop 10 ms, hamming window, 23 Mel bands, 13 cepstral coefficients out. The output stream is not ALSA compress, but a fake PCM stream with magic sync word followed by 16 bit data when ceps were inserted. Otherwise zeros to maintain in sink same PCM format as in source.

Jul 01 '22 18:07 singalsu

This version separated matrix, window and Mel frequency functions into separate generic library functions.

Jul 27 '22 11:07 singalsu

This version changes FFT to a new 16 bit version. It saves a lot of RAM with minimal impact to quality with 16 bit data. Since there's a lot to review I will split the FFT change to other PR. I'm now happy with FFT quality so FFT should be ready after that PR for xtensa SIMD optimization patches.

Jul 29 '22 16:07 singalsu

The just pushed version fixed a build issue with functions comp_update_buffer_consume/produce() those have changed names. No other changes. Also more recent versions of FFT library and window functions library are now in their own PRs.

Aug 15 '22 15:08 singalsu

@ShriramShastry would you be able to review. Thanks !

Aug 24 '22 11:08 lgirdwood

@ShriramShastry would you be able to review. Thanks !

Sure, I'II review the PR. Thank you

Aug 24 '22 11:08 ShriramShastry

@singalsu conflicts

Aug 31 '22 15:08 lgirdwood

Hi, Seppo

For such a big feature, if people want to know the background, do we have a design document to explain the details? if not, I would suggest add a readme to describe the whole feature, then reviewer can have better understanding on this, do you think so? like design? filter type, filter stage, Q format, etc

Thanks Tim

The reference code for this component is in #5769. The work target is a low-power SOF component that is setup parameters compatible Pytorch library Kaldi MFCC and librosa MFCC. The Matlab concept achieves with a limited set of parameters fair match with Pytorch. Librosa compatibilility need to be improved. The output stream needs to be changed to ALSA compress type. It's currently only a fake PCM stream. I hope I can make a user space demo small scale ASR with those libraries that demonstrates the FW MFCC. I will keep adding more Pytorch and Librosa like options to improve compatibility with parameters variation. See tools/tune/mfcc how to set up it via binary blob.

Sep 01 '22 16:09 singalsu

The just pushed version is without libraries and with minor updates. This should build OK when #6178 is merged. I will next address the review feedback for this component.

Sep 01 '22 16:09 singalsu

Thanks, Seppo, that's will be helpful, I roughly went through MFCC design(outside), it is complexed, since our design is low power, there should have some tradeoffs. Do we have local C environment test for ASR with MFCC? with matlab code, seems only few people know this.

If more people want to know current MFCC framework, a diagram for the whole MFCC flow is helpful, especially compared with full MFCC flow, then more people will know our design's benefit, do you think so?

Thanks Tim

Sep 02 '22 03:09 btian1

@btian1 I would like to make a demo ASR (limited to e.g. numbers recognize) with e.g. python libraries, run it in e.g. UP extreme or UP2 user space with MFCC data from DSP. The MFCC flow is simple but there's complexity in details. Matlab offline processed ASR would be even quicker path but it I'd prefer a demo that could run on our test DUTs.

For now the best documents are about librosa and Pytorch: https://pytorch.org/audio/0.11.0/tutorials/audio_feature_extractions_tutorial.html

Sep 02 '22 07:09 singalsu

The latest push contains code rebase after split of DCT library and some cleanup. Still work to do remains for supporting component test in testbench env. Main lib functions tests are now fortunately in pretty good shape in their own PRs.

Sep 02 '22 16:09 singalsu

I pushed previously accidentally library patches duplicates, now fixed.

Sep 02 '22 16:09 singalsu

Some results for MFCC component fixed point accuracy with 1s chirp signal test:

MFCC Matlab concept vs. Pytorch reference, ceps RMS error 1e-4
MFCC component vs. Pytorch reference, 4.7081
Override FFT out with concept FFT out, 1.6845
Override MEL filterbank with concept Mel filterbank out, 0.0349

The test was done with file read code to inject data from file into algorithm to replace the first steps in computation (FFT and Mel filterbank). It seems that the largest source of error vs. floating point reference is the FFT. Mel filterbank is less noisy. And DCT seems to to contribute very little to error. The accuracy optimization effort should be put mostly to FFT.

Sep 13 '22 17:09 singalsu

@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?

Sep 16 '22 16:09 lgirdwood

@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?

The largest difference contribution is from 16 bit FFT and 16 bit Mel band triangles. I need to retest with with 32 bit FFT version. It's a kconfig option, so easy to switch.

I'd like to understand how much the difference in fixed point MFCC impacts speech recognition word error rate. I should be able to do such test too with not too much work.

Sep 19 '22 11:09 singalsu

@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?

The largest difference contribution is from 16 bit FFT and 16 bit Mel band triangles. I need to retest with with 32 bit FFT version. It's a kconfig option, so easy to switch.

I'd like to understand how much the difference in fixed point MFCC impacts speech recognition word error rate. I should be able to do such test too with not too much work.

The RMS error in chirp test drops from 4.708 to 1.771, so the 32 bit FFT improves quite a bit. Below is the error vs. Pytorch for Matlab float, 32 bit FFT version version of component, 16 bit FFT version. All use 16 bit PCM data as input.

Sep 20 '22 13:09 singalsu

To fix this I tried to change to src/audio/Kconfig select COMP_MODULE_ADAPTER instead of depends.

[ 96%] Linking C executable sof
/home/sof/work/xtensa-imx-elf/lib/gcc/xtensa-imx-elf/10.2.0/../../../../xtensa-imx-elf/bin/ld: CMakeFiles/sof.dir/src/audio/mfcc/mfcc.c.o: in function `sys_comp_module_mfcc_interface_init':
/home/sof/work/sof.git/src/audio/mfcc/mfcc.c:282: undefined reference to `module_adapter_new'

Edit: Seems to not help, change back to depends that other module_adapter clients use.

Sep 20 '22 14:09 singalsu

Changed vs. previous push, in Kconfig

        depends on COMP_MODULE_ADAPTER
        depends on !COMP_LEGACY_INTERFACE

that seems to avoid the imx build issue in CI.

Sep 27 '22 12:09 singalsu

@wszypelt @lrudyX the failed logs are timing out. Can you check. Thanks !

Oct 03 '22 14:10 lgirdwood

@lgirdwood a lot of tests are running today, but I have already added everything to the queue, so within 2-3 hours it should be ready

Oct 03 '22 14:10 wszypelt