rhasspy icon indicating copy to clipboard operation
rhasspy copied to clipboard

[Feature] Detect person

Open jwillmer opened this issue 5 years ago • 2 comments
trafficstars

It would be most useful if we can train the system to differentiate who said something. Depending on the person we could then start or ignore a command. For instance:

  • a guest in the house can't reorder (buy) supplies by talking to the voice assistant
  • the kids can't start movies via voice assistant if the movie is not for there age
  • ..

jwillmer avatar Mar 12 '20 21:03 jwillmer

Kaldi apparently supports this through something called "x-vectors". I'd be interested to add this, but I haven't had time to look into how to do a basic "WAV files + labels" training for classification.

BTW, the kids activating Rhasspy are why I can't really use it at home much :/

synesthesiam avatar Mar 28 '20 19:03 synesthesiam

I’ve tested Kaldi « i-vectors » for speaker identification but it needs a LOT of training data to approach a satisfactory error rate (a few hundred short WAVs per user is apparently the minimum).

The best I got with around 5 samples per user was a 24% error rate following this : http://jrmeyer.github.io/asr/2017/09/29/challenge.html

The « x-vectors » add some improvements but they still needs like hundreds of samples per user to perform correctly (like 7-8% ER)

It would be pretty awesome to achieve speaker identification though 😊

mathquis avatar Mar 28 '20 19:03 mathquis