vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

KWS Demo

Open SumeetGohil opened this issue 4 years ago • 16 comments

please share Keyword Search example with same aar lib ?

SumeetGohil avatar Jan 08 '20 08:01 SumeetGohil

Not specific to the android

nshmyrev avatar May 19 '20 16:05 nshmyrev

Приветствую. Есть подвижки по данному вопросу?

KhArtNJava avatar Jun 10 '20 17:06 KhArtNJava

It would be nice to implement https://github.com/kaldi-asr/kaldi/blob/master/src/online2bin/online2-wav-nnet3-wake-word-decoder-faster.cc

nshmyrev avatar Jun 20 '20 07:06 nshmyrev

I wish Wake Word Detection function be implemented in other programming languages (python / javascript).

hyansuper avatar Aug 08 '20 06:08 hyansuper

As a quick fix one might init recognizer like this:

  rec = KaldiRecognizer(model, wf.getframerate(), '[ "keyphrase", "[unk]" ]')

and it will either detect keyphrase or unk keyword.

nshmyrev avatar Jan 07 '21 18:01 nshmyrev

Thank you for the quickfix idea. Unfortunately this does not seem to work in my case. If the keyword is potato and the speaker says there is a potato on the table the recognizer will detect potato, but only after the speaker finishes the whole sentence. It would be great if the recognizer could stop and raise a flag as soon as potato has been pronounced.

stanislas-brossette avatar Jan 08 '21 09:01 stanislas-brossette

  1. what does [unk] mean?
  2. how to detect more than one keyphrase? say, I want to detect "Hello" OR "Hi", how do I put it ? Thank you

Edit:now I know [unk] means noise to be filtered out

hyansuper avatar Jan 08 '21 14:01 hyansuper

You can try Snowboy for KWS task. The project is not maintained anymore, but there are several good models trained (e.g. for Alexa and Snowboy wake words) and plenty of examples available in different programming languages. I'm using it right now together with Vosk. So far so good.

sskorol avatar Jan 08 '21 16:01 sskorol

Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?

stanislas-brossette avatar Jan 08 '21 17:01 stanislas-brossette

@sskorol Thanks, Sowboy works great. But the universual wake words are limited, and trained custom wake words is only person specific, and it's closed down.

hyansuper avatar Jan 09 '21 04:01 hyansuper

@sskorol Thanks, Sowboy works great. But the universual wake words are limited, and trained custom wake words is only person specific, and it's closed down.

What do you mean by limited? Anyone can use them w/o restrictions. And in terms of custom wake words, do you really think that the training process would be drastically different? You'd still need a lot of samples recorded by different people (age, gender, nationality, accent, etc) to make a generic and robust model. So the problem is not in the tool. There's a number of solutions which gives you an opportunity to train your own wake word. But the main problem is still in data. No data - no generic wake word. And for exotic wake words you won't be able to generate enough data for training by your own. That's why there was made an attempt (by Snowboy devs) to collect data via public crowdsourcing service. But it failed, as most of the people are lazy, and don't wanna spend their time by recording wake words for someone else.

The other important thing is required resources for the actual KWS engine. Ideally, it should be fully independent from ASR engine and bundled into mic array firmware to avoid continuous data streaming through network to ASR engine and false positive triggers. Moreover, you can't run Vosk on e.g. esp32 (Matrix Voice) or Respeaker Core. And with Snowboy you can. That's why I don't believe it's reasonable trying to solve this task with heavy ASR engine. It should be an independent lightweight and cross-platform API.

sskorol avatar Jan 09 '21 08:01 sskorol

Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?

You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words

hyansuper avatar Jan 09 '21 14:01 hyansuper

Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?

You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words

That's a great idea to know when to stop. Thank you.

stanislas-brossette avatar Jan 09 '21 19:01 stanislas-brossette

Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?

You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words

That's a great idea to know when to stop. Thank you.

Salut Stan,

Did you try this solution with acceptable results ? I'm also looking for managing KWS with Vosk.

pga-avionics avatar Feb 15 '21 09:02 pga-avionics

Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?

You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words

That's a great idea to know when to stop. Thank you.

Salut Stan,

Did you try this solution with acceptable results ? I'm also looking for managing KWS with Vosk.

Hello pga-avionics, Yes, I tried limiting the number of words in partial results and parse the final result for my keyword and the results are quite satisfactory. There are still some failures and slowness in recognition, but overall it is a good workaround while looking forward to the real implementation of KWS.

stanislas-brossette avatar Feb 22 '21 13:02 stanislas-brossette

If the keyword is potato and the speaker says there is a potato on the table the recognizer will detect potato, but only after the speaker finishes the whole sentence.

that seems to me untrue. Using Vosk you can get result word by word and so trigger your action afterward.

It would be great if the recognizer could stop and raise a flag as soon as potato has been pronounced.

That's currently feasible using vosk-api!

solyarisoftware avatar May 11 '22 12:05 solyarisoftware

If I give only few words to train and test, will that detects those selective words?

Reethuch avatar Apr 05 '23 19:04 Reethuch