esp-sr
esp-sr copied to clipboard
Want to suggest a wake word? Leave your thoughts here. (AIS-1441)
Hi all,
We're excited to offer the community more free and high-quality wake word models. Everyone has their own unique wake word preferences. Now, we're ready to regularly release some of the most popular wake words. Please let us know the wake words you want! English and Chinese are both welcome.
In the past, it was an expensive process to collect high-quality human speech data. But now, our team has developed a cost-effective way to train wake word models by using only TTS samples, which reaches 90-95% accuracy compared to models trained by human-recorded samples.
The wake word models and esp-sr have the same license and are free for commercial use. If you want a more accurate and exclusive wake word, please use our wake word customization service.
The Willow team and community would love "Hey Willow". It's our domain name because we've been waiting for this.
Thank you very much for offering this option, it's very exciting!
The Willow team and community would love "Hey Willow". It's our domain name because we've been waiting for this.
Thank you very much for offering this option, it's very exciting!
I'm glad you like this. Since "hey" and "hi" sound pretty similar, sometimes people might not really notice the difference. So, I was thinking, maybe we could support both "hey willow" and "hi willow" for waking up the device. That way, whether you say "hey willow" or "hi willow", it'll still work. Of course, when we release the wake word model, we'll call it like "wn9_heywillow". What do you think about that?
Good idea!
My only concern would be overall reduced accuracy (wake reliability vs false wake). We've noticed quite a bit of false wake with Alexa. From what I've read the automated TTS approach has 90-95% the accuracy of the models trained on human samples. I like "two word" wake words because they tend to improve accuracy, I suspect a 100% "Hey Willow" wake word could result in equivalent or even improved accuracy with the TTS approach vs even human sample trained Alexa?
Of course we could always test this, even starting with a pure "Hey Willow" model, a pure "Hi Willow" model, and a merged model.
Thanks again for offering this!
Your concern may indeed happen. We will generate two words and test which model performs better.
"hey/hi willow" model: Model name: wn9_heywillow_tts FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 88%
Test dataset description: The FAR dataset: This dataset contains a total of 64 hours of audio data, which includes audio collected from the internet and audio recorded using esp32-korvo boards. The RAR dataset: This dataset is generated by multiple commercial TTS APIs, with a total of approximately 500 samples. These data and models were not used in the training process. However, due to the differences between TTS samples and human samples, please exercise caution when referring to the test results.
Guys, what you are doing is really great. We have created a smart speaker called Homai based on the esp32-s3. We trained the model ourselves, but it is resource-intensive and not so easy to integrate into the pipeline. Could you please add support for our word Homai [ho'mai]? Thank you in advance!
Hi @AigizK , The syllable of Homai only has two. It is difficult to reduce the probability of false triggering for monosyllabic and disyllabic phrases. We recommend selecting a 3-5 syllable phrase as the wake word.
Hi @sun-xiangyu We have already launched a project with this name, so we can't change it significantly. But can we use the variant "homa ai", where the sound 'A' is pronounced long?
We have already launched a project with this name, so we can't change it significantly. But can we use the variant "homa ai", where the sound 'A' is pronounced long?
I'm sorry that our TTS model cannot specify a syllable to extend its pronunciation at the moment. This means that we cannot generate a large number of accurate “homa ai” phrases.
Hi! Thank you for this awesome solution! We are developing a smart voice assistant called Sophia. Would it be possible to have the wake word "Hi Sophia"? This would help our user experience drastically. Thank you in advance!
Hi @PrathamG , I'm glad you like it. "Sophia" sounds like a wake word that can be used directly. I mean, maybe we don't need an extra prefix "Hi". I suggest we start with just "Sophia". If the performance is not satisfactory, then we can train another one with "hi Sophia". What do you think?
Sure, that sounds like a good plan! We can use only "Sophia" and test the performance first. Thank you
If possible, I also wanted to request the wake word "Little Sophia". We are still unsure about which wake word to use, and having both options will help us determine this via user testing.
If possible, I also wanted to request the wake word "Little Sophia". We are still unsure about which wake word to use, and having both options will help us determine this via user testing.
Now our computing resources are limited. This project can generate about two wake word models in a month. So we will choose some popular wake words. Of course, if we have some free time, "Little Sophia" is also fine.
No worries, totally understandable! Looking forward to testing out the "Sophia" wake word
"Sophia" model: wn9_sophia_tts
FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 97%
“小美” or “小美同学” would be a perfect choice. It will suit a lot of use case. We all want wake word like a human name.
@xygh, “小美同学” sounds good.
"Sophia" model: wn9_sophia_tts
FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 97%
Thank you! We will test it out and report the results by next week
@xygh, “小美同学” sounds good.
BTW, “你好小美” is also a perfect choice.
"小当家" or "Hi 小星" is preferable wake word in our scenario. Thanks a lot!
The second version "Sophia":
model info: wakenet9l_tts1h8v2_Sophia_3_0.647_0.649
Perfromace: FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 95%
Improvement: Add "Sophie" and "Sophy" as hard negatives to reduce false triggers.
"小当家" or "Hi 小星" is preferable wake word in our scenario. Thanks a lot!
Both of these words sound good. If you have no preference, we will choose "hi 小星".
"小美同学" model info: wakenet9l_tts1h8_小美同学_3_0.633_0.644
FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 95%
Hello! This is a great opportunity I was hoping would come up, I'm so glad this is now possible! I've seen that the wake-words "Mycroft" and "Hey, Mycroft" are very popular in the community, and it is also the name of my product so would very much improve user experience. Would it be possible to have either of these trained and released for the community? Thank you so much in advance for this!
@lewardo, I'm glad it could help you. Although "Mycroft" is simpler, it seems there are quite a few words that sound similar, so I'll prioritize training with "Hey Mycroft."
@Henry586 ,
Hi,小星: wakenet9l_tts1h8_Hi,小星_3_0.626_0.630
Perfromace: FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 93%
I'd love to have "hey printer" available as a wake word/phrase.
I want to suggest a wake word ,"小龙小龙". I'm glad to hear that you can create a wake word.
@lewardo , The performance of "Mycroft" also looks good. Pls try. Mycroft: wakenet9l_tts1h8_Mycroft_3_0.625_0.629
Perfromace: FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 96%