pocketsphinx
pocketsphinx copied to clipboard
Speedup volume adaptation
I am using AlexaPi on a raspberry pi but when i mute the tv, and say the hotword(done with pocketsphinx), it doesn't hear me because the mic volume is still set for the loud tv.
Is there a way to adjust the speed of the volume recovery to be able to be heard within 2 or 3 seconds?
It would require an algorithm like the one described in this paper (automatic gain control):
https://static.googleusercontent.com/media/research.google.com/ru//pubs/archive/43289.pdf
I saw the thread and looked up the AGC inside the paper. I decided to experiment with the idea by trying out an implementation. @nshmyrev could you please help me with some insight at section 2.2.1 A generative Model of Peak Signal Level: "We then compute the peak signal level, l, of the audio samples in each of these chunks 0 <= l <= 1." When saying "peak signal level" what is the author referring to ? Do they mean taking the max absolute value sample of the 100ms audio chunk and dividing it with the max value that could be represented by the type or something else ? Thank-you for any light you shed on this.
Most "peak signal level" algorithms are based on root-mean-square values, or something related.
This is not so much an issue of AGC (which doesn't work) but the slow adaptation of live-mode CMN. You can try re-recognizing the audio segments in batch mode (i.e. set full_utt
to TRUE
in ps_process_raw
) to see the difference. It is likely that we will provide some better kind of interface for manipulating the CMN as it is way more important than any other acoustic adaptation.
Hiya @dhdaines
Talking of volume, did a test yesterday to build with the new changes. A couple of questions for you.
a, has the api changed? This since get_in_speech()
appears to have been removed from the pocketsphinx.h
header.
b, if changed, how is the new api meant to be structured?
Thank you
Hi! The streaming API was unfortunately rather buggy, so I reverted everything to the PocketSphinx 0.8 version, removing the internal voice activity detection, because it broke some expectations about time alignments that many people, including me, relied on.
That said, ps_get_in_speech(), even though it's a bad API, is simple and pretty widely used, so I will bring it back soon, maybe today :)
Hi David,
Curious about how else to assess speech state if a bad api?! Not familiar with the old implementation. Thank you.
Speech state now done externally to the decoder, for various reasons (it is logical to do it internally to the decoder but has various practical problems). CMN can now be manipulated through the ps_get_cmn()
and ps_set_cmn()
functions, see https://cmusphinx.github.io/doc/pocketsphinx/pocketsphinx_8h.html and also https://github.com/cmusphinx/pocketsphinx/blob/master/test/unit/test_ps.c#L62