mycroft-core icon indicating copy to clipboard operation
mycroft-core copied to clipboard

mycroft pocketsphinx has trouble with female voice wake word activation

Open fermulator opened this issue 4 years ago • 17 comments

  • I'm running Mycroft on Linux Mint 18.X
  • With version 20.2.1 of the Mycroft software
  • With the standard Wake Word "Hey Mycroft"
  • Using pocketsphinx
  • Microphone: https://partners.andreaelectronics.com/pureaudio-usb-array-microphone-bundle/

Explanation of Problem

  • For myself (male), Mycroft wake word activation works 90% of the time. I can be anywhere in room w/ the device, and speak "Hey Mycroft" clearly towards the listener mic, and it activates.
  • For my wife (female), however, Mycroft wake word has only worked <10% of the time, and it is quite frustrating for her. It ONLY works if she is within 2-3ft of the listener mic and even then often it doesn't pick up her voice for wake word.

NOTE: Once activated, usually Mycroft has no problem interpreting her voice for the command .. which leads me to suspect the wake-word pocketsphinx as an issue (rather than the microphone hardware)

Microphone gain is 80%, and levels read:

~ 120-130 idle/quiet
bounce between  80-110
``

## Considerations

Being female, her voice is in a higher octave (soprano) than mine.  How can users troubleshoot and isolate wake word activation problems for individual users?  Is there a special "training" required to have Mycroft learn the typical users in a household?

Note that I am running this mostly as a PoC on an old Dell notebook, which for simplicity I have switched to pocketsphinx as the CPU is too old for Precise instruction set requirements.  Is this a factor here? (when we switch to a newer device should we expect superior performance and functionality on Precise?)

fermulator avatar Apr 26 '20 17:04 fermulator

Hey fermulator,

This is a known problem unfortunately, and it's most likely the wake word. We're doing some work on Precise at the moment and hoping to improve it's handling of higher pitched voices. We're exploring some options beyond simply improving the model itself. So possibly there is something there that could translate across to pocketsphinx.

If for example we can have a higher confidence that a wake word may be getting spoken at a particular moment, we could temporarily drop the pocketsphinx threshold a little, increasing the chance of activation in that window without having it trigger more often at other times. This is all theoretical at the moment, so if anyone else has other ideas to improve PocketSphinx performance for a diverse array of voices we're definitely interested.

krisgesling avatar Apr 28 '20 10:04 krisgesling

pocketsphinx is pretty much a dead end, i wouldnt spend any time trying to improve it, pocketsphinx not being good was the reason precise was developed, pocketsphinx is the only option for 32bit systems however

JarbasAl avatar Apr 28 '20 21:04 JarbasAl

hey all, I've upgraded to picroft, and using precise now ... but the female wake word (while slightly better) is still very frustrating to use for that same female; please advise?

fermulator avatar Jun 26 '20 00:06 fermulator

Hey, we know that the wake word is not as good at detecting women's voices. The biggest issue currently is that we don't have enough training data from female users. In a way, the system not working for them is a self-fulfilling prophecy.

To fix this we're about to start some targeted data collection to make sure our wake word training samples better reflect the diversity of the population, rather than the diversity of our current user base. It would be great to get your help with that. I'll send you a message when we have a process in place to collect these.

krisgesling avatar Jun 26 '20 01:06 krisgesling

Sounds good thanks Kris; Look forward to participating and contributing voice data.

fermulator avatar Jun 30 '20 13:06 fermulator

My wife uses the work around of making her voice lower.

JamesOsborn-SE avatar Mar 15 '21 15:03 JamesOsborn-SE

Sounds good thanks Kris; Look forward to participating and contributing voice data.

Where is the project that houses the raw Wav data collection that she might contribute?

JamesOsborn-SE avatar Mar 15 '21 15:03 JamesOsborn-SE

Hey, we know that the wake word is not as good at detecting women's voices. The biggest issue currently is that we don't have enough training data from female users. In a way, the system not working for them is a self-fulfilling prophecy.

To fix this we're about to start some targeted data collection to make sure our wake word training samples better reflect the diversity of the population, rather than the diversity of our current user base. It would be great to get your help with that. I'll send you a message when we have a process in place to collect these.

Hey @krisgesling any news on this? We are discussing whether to pre-order a Mark 2 (at home), but that won't fly if it doesn't work for my wife ;)

P.S. We had sent some samples a while back to @MatthewScholefield , not sure what happened after that, see https://community.mycroft.ai/t/family-acceptance-factor/2273/14

shaan7 avatar Dec 29 '21 13:12 shaan7

Hey there, we have made progress on this and from all accounts women's voices in particularly are much better detected. Still got more work to do but it's certainly made a big difference in my house!

krisgesling avatar Dec 29 '21 22:12 krisgesling

Oh I just realised the title of this issue is about Pocketsphinx. That's not something we're working on - but I presume you are talking about the default wake word detection using our Precise engine.

krisgesling avatar Dec 29 '21 22:12 krisgesling

That's good to hear Kris; (indeed the original issue was posted from PocketSphinx, but the Precise engine too suffered the same at the time)

fermulator avatar Dec 30 '21 02:12 fermulator

If progress has been made, is there a relevant roadmap or ticket(s) that we can cross ref that would close/duplicate this issue against other work?

fermulator avatar Dec 30 '21 02:12 fermulator

Not in the next 3 sprints so nothing overly specific right now. But we definitely will have it well in advance of the Mark II's shipping out.

krisgesling avatar Dec 30 '21 02:12 krisgesling

Oh I just realised the title of this issue is about Pocketsphinx

Ouch my bad, I should have reported on a different issue, I indeed meant using the default "hey Mycroft" wakeword with Precise. Great to hear there's progress, thanks for the update.

shaan7 avatar Dec 30 '21 08:12 shaan7

FWIW, I've noticed the same issue, and also when my 5-year-old tries to make it respond. He can do it about 50% of the time if he uses his "man voice" (a muffled, lower, slowed-down version of his own voice that makes him giggle).

On the one hand not having it respond easily to children could be considered a feature...but on the other he really wants to use Mycroft to answer questions for him.

Maybe the Ezra project has had some success adapting to a child's voice?

mikejgray avatar May 22 '22 01:05 mikejgray

@krisgesling - perhaps any updates we can track/watch? Other tickets? Any testing the community can do to verify? Or supplication of sample data to test?

fermulator avatar May 30 '22 11:05 fermulator

Hey, we haven't gotten back to the Precise improvements yet. It's still on our upcoming sprints though.

The intention is to provide a structured way for community members to contribute supplemental data because relying on data we get from regular device usage by opted in members is clearly a flawed premise. If you can't wake the device, then how do you voice samples ever get contributed!?!

Our focus will be on Precise as the experience it provides is just radically better than PocketSphinx. PocketSphinx itself is also not being actively maintained (AFAIK). The one big benefit of PocketSphinx that I see is that you can define any new wake word by simply typing it into your config. With the right data and training pipeline for Precise in place we will hopefully be able to offer the quality of Precise detection, along with the flexibility of a choose your own wake word system.

krisgesling avatar May 31 '22 04:05 krisgesling