sof icon indicating copy to clipboard operation
sof copied to clipboard

Audio: Add audio feature extractor component MFCC

Open singalsu opened this issue 3 years ago • 5 comments

singalsu avatar Jun 29 '22 16:06 singalsu

The work so far runs in testbench. It reads a wav file and outputs to raw binary file the MFCC data. A Matlab/Octave script is provided to parse the output and extract the MFCC payload from audio capture file. E.g.

cd $SOF_WORKSPACE/sof
scripts/build-tools.sh -t
scripts/rebuild-testbench.sh 
cd tools/tune/mfcc/
./run_mfcc.sh /usr/share/sounds/alsa/Front_Center.wav 
octave --gui &
decode_ceps('mfcc.raw',13);

The above commands create this plot:

Screenshot from 2022-06-29 19-20-26

The MFCC operation followed configuration that was defined in mfcc_setup.m. It was used to output the configuration blob. Editing it and redoing above step with test topologies would change the audio features plot appearance.

singalsu avatar Jun 29 '22 16:06 singalsu

The just pushed version ran successfully in my TGL-H test device. Used topology was sof-hda-generic-2ch-mfcc.tplg. Average load was quite decent 24 MCPS for 16 kHz mono MFCC computation, FFT size 512, FFT hop 10 ms, hamming window, 23 Mel bands, 13 cepstral coefficients out. The output stream is not ALSA compress, but a fake PCM stream with magic sync word followed by 16 bit data when ceps were inserted. Otherwise zeros to maintain in sink same PCM format as in source.

singalsu avatar Jul 01 '22 18:07 singalsu

This version separated matrix, window and Mel frequency functions into separate generic library functions.

singalsu avatar Jul 27 '22 11:07 singalsu

This version changes FFT to a new 16 bit version. It saves a lot of RAM with minimal impact to quality with 16 bit data. Since there's a lot to review I will split the FFT change to other PR. I'm now happy with FFT quality so FFT should be ready after that PR for xtensa SIMD optimization patches.

singalsu avatar Jul 29 '22 16:07 singalsu

The just pushed version fixed a build issue with functions comp_update_buffer_consume/produce() those have changed names. No other changes. Also more recent versions of FFT library and window functions library are now in their own PRs.

singalsu avatar Aug 15 '22 15:08 singalsu

@ShriramShastry would you be able to review. Thanks !

lgirdwood avatar Aug 24 '22 11:08 lgirdwood

@ShriramShastry would you be able to review. Thanks !

Sure, I'II review the PR. Thank you

ShriramShastry avatar Aug 24 '22 11:08 ShriramShastry

@singalsu conflicts

lgirdwood avatar Aug 31 '22 15:08 lgirdwood

Hi, Seppo

For such a big feature, if people want to know the background, do we have a design document to explain the details? if not, I would suggest add a readme to describe the whole feature, then reviewer can have better understanding on this, do you think so? like design? filter type, filter stage, Q format, etc

Thanks Tim

The reference code for this component is in #5769. The work target is a low-power SOF component that is setup parameters compatible Pytorch library Kaldi MFCC and librosa MFCC. The Matlab concept achieves with a limited set of parameters fair match with Pytorch. Librosa compatibilility need to be improved. The output stream needs to be changed to ALSA compress type. It's currently only a fake PCM stream. I hope I can make a user space demo small scale ASR with those libraries that demonstrates the FW MFCC. I will keep adding more Pytorch and Librosa like options to improve compatibility with parameters variation. See tools/tune/mfcc how to set up it via binary blob.

singalsu avatar Sep 01 '22 16:09 singalsu

The just pushed version is without libraries and with minor updates. This should build OK when #6178 is merged. I will next address the review feedback for this component.

singalsu avatar Sep 01 '22 16:09 singalsu

Thanks, Seppo, that's will be helpful, I roughly went through MFCC design(outside), it is complexed, since our design is low power, there should have some tradeoffs. Do we have local C environment test for ASR with MFCC? with matlab code, seems only few people know this.

If more people want to know current MFCC framework, a diagram for the whole MFCC flow is helpful, especially compared with full MFCC flow, then more people will know our design's benefit, do you think so?

Thanks Tim

btian1 avatar Sep 02 '22 03:09 btian1

@btian1 I would like to make a demo ASR (limited to e.g. numbers recognize) with e.g. python libraries, run it in e.g. UP extreme or UP2 user space with MFCC data from DSP. The MFCC flow is simple but there's complexity in details. Matlab offline processed ASR would be even quicker path but it I'd prefer a demo that could run on our test DUTs.

For now the best documents are about librosa and Pytorch: https://pytorch.org/audio/0.11.0/tutorials/audio_feature_extractions_tutorial.html

singalsu avatar Sep 02 '22 07:09 singalsu

The latest push contains code rebase after split of DCT library and some cleanup. Still work to do remains for supporting component test in testbench env. Main lib functions tests are now fortunately in pretty good shape in their own PRs.

singalsu avatar Sep 02 '22 16:09 singalsu

I pushed previously accidentally library patches duplicates, now fixed.

singalsu avatar Sep 02 '22 16:09 singalsu

Some results for MFCC component fixed point accuracy with 1s chirp signal test:

  • MFCC Matlab concept vs. Pytorch reference, ceps RMS error 1e-4
  • MFCC component vs. Pytorch reference, 4.7081
  • Override FFT out with concept FFT out, 1.6845
  • Override MEL filterbank with concept Mel filterbank out, 0.0349

The test was done with file read code to inject data from file into algorithm to replace the first steps in computation (FFT and Mel filterbank). It seems that the largest source of error vs. floating point reference is the FFT. Mel filterbank is less noisy. And DCT seems to to contribute very little to error. The accuracy optimization effort should be put mostly to FFT.

singalsu avatar Sep 13 '22 17:09 singalsu

@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?

lgirdwood avatar Sep 16 '22 16:09 lgirdwood

@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?

The largest difference contribution is from 16 bit FFT and 16 bit Mel band triangles. I need to retest with with 32 bit FFT version. It's a kconfig option, so easy to switch.

I'd like to understand how much the difference in fixed point MFCC impacts speech recognition word error rate. I should be able to do such test too with not too much work.

singalsu avatar Sep 19 '22 11:09 singalsu

@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?

The largest difference contribution is from 16 bit FFT and 16 bit Mel band triangles. I need to retest with with 32 bit FFT version. It's a kconfig option, so easy to switch.

I'd like to understand how much the difference in fixed point MFCC impacts speech recognition word error rate. I should be able to do such test too with not too much work.

The RMS error in chirp test drops from 4.708 to 1.771, so the 32 bit FFT improves quite a bit. Below is the error vs. Pytorch for Matlab float, 32 bit FFT version version of component, 16 bit FFT version. All use 16 bit PCM data as input.

image

singalsu avatar Sep 20 '22 13:09 singalsu

To fix this I tried to change to src/audio/Kconfig select COMP_MODULE_ADAPTER instead of depends.

[ 96%] Linking C executable sof
/home/sof/work/xtensa-imx-elf/lib/gcc/xtensa-imx-elf/10.2.0/../../../../xtensa-imx-elf/bin/ld: CMakeFiles/sof.dir/src/audio/mfcc/mfcc.c.o: in function `sys_comp_module_mfcc_interface_init':
/home/sof/work/sof.git/src/audio/mfcc/mfcc.c:282: undefined reference to `module_adapter_new'

Edit: Seems to not help, change back to depends that other module_adapter clients use.

singalsu avatar Sep 20 '22 14:09 singalsu

Changed vs. previous push, in Kconfig

        depends on COMP_MODULE_ADAPTER
        depends on !COMP_LEGACY_INTERFACE

that seems to avoid the imx build issue in CI.

singalsu avatar Sep 27 '22 12:09 singalsu

@wszypelt @lrudyX the failed logs are timing out. Can you check. Thanks !

lgirdwood avatar Oct 03 '22 14:10 lgirdwood

@lgirdwood a lot of tests are running today, but I have already added everything to the queue, so within 2-3 hours it should be ready

wszypelt avatar Oct 03 '22 14:10 wszypelt