pocketsphinx icon indicating copy to clipboard operation
pocketsphinx copied to clipboard

Speedup volume adaptation

Open mano1979 opened this issue 7 years ago • 7 comments

I am using AlexaPi on a raspberry pi but when i mute the tv, and say the hotword(done with pocketsphinx), it doesn't hear me because the mic volume is still set for the loud tv.

Is there a way to adjust the speed of the volume recovery to be able to be heard within 2 or 3 seconds?

mano1979 avatar Mar 17 '17 17:03 mano1979

It would require an algorithm like the one described in this paper (automatic gain control):

https://static.googleusercontent.com/media/research.google.com/ru//pubs/archive/43289.pdf

nshmyrev avatar Mar 18 '17 06:03 nshmyrev

I saw the thread and looked up the AGC inside the paper. I decided to experiment with the idea by trying out an implementation. @nshmyrev could you please help me with some insight at section 2.2.1 A generative Model of Peak Signal Level: "We then compute the peak signal level, l, of the audio samples in each of these chunks 0 <= l <= 1." When saying "peak signal level" what is the author referring to ? Do they mean taking the max absolute value sample of the 100ms audio chunk and dividing it with the max value that could be represented by the type or something else ? Thank-you for any light you shed on this.

Rares14324 avatar Oct 06 '17 12:10 Rares14324

Most "peak signal level" algorithms are based on root-mean-square values, or something related.

ulatekh avatar Sep 11 '18 20:09 ulatekh

This is not so much an issue of AGC (which doesn't work) but the slow adaptation of live-mode CMN. You can try re-recognizing the audio segments in batch mode (i.e. set full_utt to TRUE in ps_process_raw) to see the difference. It is likely that we will provide some better kind of interface for manipulating the CMN as it is way more important than any other acoustic adaptation.

dhdaines avatar Jun 13 '22 11:06 dhdaines

Hiya @dhdaines

Talking of volume, did a test yesterday to build with the new changes. A couple of questions for you. a, has the api changed? This since get_in_speech() appears to have been removed from the pocketsphinx.h header. b, if changed, how is the new api meant to be structured?

Thank you

dotmain avatar Jul 07 '22 08:07 dotmain

Hi! The streaming API was unfortunately rather buggy, so I reverted everything to the PocketSphinx 0.8 version, removing the internal voice activity detection, because it broke some expectations about time alignments that many people, including me, relied on.

That said, ps_get_in_speech(), even though it's a bad API, is simple and pretty widely used, so I will bring it back soon, maybe today :)

dhdaines avatar Jul 07 '22 11:07 dhdaines

Hi David,

Curious about how else to assess speech state if a bad api?! Not familiar with the old implementation. Thank you.

#262

dotmain avatar Jul 07 '22 16:07 dotmain

Speech state now done externally to the decoder, for various reasons (it is logical to do it internally to the decoder but has various practical problems). CMN can now be manipulated through the ps_get_cmn() and ps_set_cmn() functions, see https://cmusphinx.github.io/doc/pocketsphinx/pocketsphinx_8h.html and also https://github.com/cmusphinx/pocketsphinx/blob/master/test/unit/test_ps.c#L62

dhdaines avatar Sep 28 '22 12:09 dhdaines