Naomi icon indicating copy to clipboard operation
Naomi copied to clipboard

Naomi does not listen while thinking

Open aaronchantrill opened this issue 4 years ago • 1 comments

Detailed Description

Naomi's microphone is not active while Naomi is thinking, which can lead to Naomi missing commands or parts of commands. This is especially annoying if the Naomi is processing audio that turns out not to be a command while you are trying to get it to do something. I find that figuring out when I can start speaking to Naomi requires visual feedback.

Context

Currently, Naomi runs on a loop. Listen for speech, use the passive stt engine to check for the presence of a wake word, use the active stt engine to verify the presence of the wake word and extract command audio, use the tti engine to identify the intent, use the intent to activate the correct speech handler, then add the audio output from the handler to the speech queue, and finally return to listening. Any sounds directed to Naomi during the processing are never recorded. This would allow Naomi to "cache" audio that is being spoken while Naomi is processing audio so it can be processed later.

Possible Implementation

We should be able to create a separate listener thread that uses the vad.get_audio() method to push blocks of audio into a queue for sequential processing in the main loop, or just a listening thread that puts all audio blocks into a queue that vad.get_audio() would read from. The passive listening engines seem to be fast enough that it shouldn't slow down response time too much if Naomi is still checking an earlier audio block for a wake word when you start speaking. We shall see.

We also need some way of clearing the buffer, so that when the 'expect()' or 'confirm()' functions are used the stream can be reset as Naomi starts asking the question. Otherwise, it will quite likely accept some sound made prior to the question will be accepted as the response. Simply clearing the buffer every time Naomi starts speaking would probably work well enough for now. We also need to be able to "pause" the buffering for users who don't have the ability to cancel audio output from the audio input stream.

aaronchantrill avatar Aug 06 '21 16:08 aaronchantrill

I have implemented buffer clearing when using expect() or confirm() by ending the listening and processing anything in the buffer just before Naomi begins asking a question. This is done by setting an arg in profile which is called "resetmic". This is picked up by the VAD plugin which will immediately stop listening and return any captured audio before Naomi asks the question.

A better approach would probably be to block Naomi from asking the question until the VAD indicates that the user is no longer speaking.

aaronchantrill avatar Dec 10 '21 15:12 aaronchantrill