rhasspy
rhasspy copied to clipboard
[Feature] Detect person
It would be most useful if we can train the system to differentiate who said something. Depending on the person we could then start or ignore a command. For instance:
- a guest in the house can't reorder (buy) supplies by talking to the voice assistant
- the kids can't start movies via voice assistant if the movie is not for there age
- ..
Kaldi apparently supports this through something called "x-vectors". I'd be interested to add this, but I haven't had time to look into how to do a basic "WAV files + labels" training for classification.
BTW, the kids activating Rhasspy are why I can't really use it at home much :/
I’ve tested Kaldi « i-vectors » for speaker identification but it needs a LOT of training data to approach a satisfactory error rate (a few hundred short WAVs per user is apparently the minimum).
The best I got with around 5 samples per user was a 24% error rate following this : http://jrmeyer.github.io/asr/2017/09/29/challenge.html
The « x-vectors » add some improvements but they still needs like hundreds of samples per user to perform correctly (like 7-8% ER)
It would be pretty awesome to achieve speaker identification though 😊