silvius icon indicating copy to clipboard operation
silvius copied to clipboard

Allow modifying buffer

Open crypdick opened this issue 6 years ago • 5 comments

Right now, you have to wait after a period of silence before the buffer is parsed. It would be great to force the buffer to be parsed (suggested word: "slurp") or to discard the buffer entirely (suggested word: "spit"). Also, popping the last word added to the stack (with "oops" or "scratch").

crypdick avatar Jun 08 '18 22:06 crypdick

There's not much we can do about having to wait. There's a silence segmentation component inside the server, part of alumae's kaldi-gstreamer-server, which is fairly conservative in determining silence. kaldi has a new speech activity detection model (ASpIRE SAD) which is amazingly good, I will integrate it at some point, but that will take some time.

Discarding the buffer entirely with "spit" already works since the parsing stops if there is any unrecognized word :) but we could add a rule so that the probability of this word gets boosted. I'm not sure how well popping the last word would work, since if I say something invalid, it tends to affect the recognition accuracy of several words in a row -- unless you mean to cancel an otherwise valid command?

I think it would be cool to have a command that recalls some of the previous command -- either until the first parse error, or until a specific word. e.g. "charlie delta space oops foo" ... "from space slash slap" -> "cd /". Thoughts? I guess it's helpful to see the parse results in this case. Do you usually have the Silvius output window visible e.g. on a second monitor? Or perhaps we could add a little bit of X11 integration so that the output would be visible.

dwks avatar Jun 09 '18 15:06 dwks

Or "charlie delta spell slash slap", "fix word spell space" to fix just one incorrect word...

dwks avatar Jun 09 '18 15:06 dwks

I didn't know that there was a recognition was temporally dependent like that. I often peek at the buffer when its having trouble recognizing a word, but I really like the idea of having the buffer visualized (I'm imagining something like https://github.com/wavexx/screenkey). Finally, I also like the idea of deleting back until a certain spot

crypdick avatar Jun 12 '18 20:06 crypdick

Sorry if this is beating a dead horse: so there's a silence detector in the server, but would it be possible client-side to manually inject the END signal and parse the buffer as-is?

crypdick avatar Jun 14 '18 02:06 crypdick

Interesting idea. So you want to be able to say "Delta left execute echo left execute..." and execute immediately once the "execute" word is decoded? That's quite possible to implement, the reason I never considered this is that most speech recognition systems have much higher accuracy in their final hypothesis, the intermediate hypothesis can contain a lot of errors. Also, the current decoder take slightly more CPU the longer an input phrase is, and this setup might encourage you to just keep speaking (and eventually lag a bunch until you pause).

You can certainly try this, although I think the better long-term solution is to have a much faster silence detection mechanism. I believe the current one just looks for a lack of sound waves, but I recently tried an online aspire speech activity detection model which is absolutely incredible at detecting when you've stopped speaking, because it understands the actual phonemes that you're speaking. For the best possible accuracy and latency at the moment, we should just use an aspire SAD model and then an nnet3 aspire speech model -- ideally with boosted command words but even without it should perform well. Let me know if you have any bandwidth for this :)

On Wed, Jun 13, 2018, 10:15 PM Richard Decal [email protected] wrote:

Sorry if this is beating a dead horse: so there's a silence detector in the server, but would it be possible client-side to manually inject the END signal and parse the buffer as-is?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dwks/silvius/issues/16#issuecomment-397145139, or mute the thread https://github.com/notifications/unsubscribe-auth/AArjF3XoZtMPYjUW0BrwCHbcL6Vqx_ojks5t8cdegaJpZM4UhCjG .

dwks avatar Jun 14 '18 12:06 dwks