esp-va-sdk
esp-va-sdk copied to clipboard
about "play song" and "sing a song"
Hello, when using the Alexa SDK, I encountered two problems: 1: when I say "play song", I only know the state of VA_listening, VA_thinking, VA_idle, but I can't know the state of the beginning and end of playing music. Can you tell me? 2: when I say "sing a song", the playback is smooth and unimpeded; But when I say "play song", it is not continuous when playing songs. If I execute va_dsp_mic_mute (1) now, the playing will be smooth. I noticed that: "sing a song" is in MP3 format, "play song" is in aac codec format. Please tell me why?
Hi @DuHeLong For 1. Audio playback events are handled internally by the SDK and aren't really exposed to the application yet. Basically, all dialog related events have been brought out so that anyone with custom hardware can use them to drive their LEDs or speakers. And this is because Alexa specification mandates some UI for these events. Alexa specs doesn't yet mandate anything for Audio playback specific events, that may require any app or board specific handling. Could you please tell if you have any specific requirements for these events?
For 2. aac decoding is more CPU intensive as compared to mp3. And since wakeword detection also runs on the host, it may not be getting enough CPU cycles to play the song smoothly.
Thanks, For 1. Our application needs this event state.Can you release this event state for us?
For 2, there are two major functions of Alexa, one is dialogue, the other is playing songs. But we have tested that if we use the word "play", the format we get is aac format, and the playback is not smooth, then the user experience is very bad. When you use the word "play" to test, do you also get aac format or something else?
Hi @DuHeLong the solution here is a prototype. The board is handling all the WW detection locally and that consumes considerable amount of CPU.
Stutters in case of AAC format playback are due to the fact that AAC needs far more CPU than MP3 and hence stutters in this case.
As far as final product is concerned, it is expected that you have separate h/w for WW detection. In that case playback should be smooth in all the cases.
Hello, thank you for your support. To solve this problem, we want to use two esp32 to solve this problem. We only use one esp32 as wake-up function. However, we found that when running ESP_VA _SDK, we must connect to the network to enter wake-up mode. Is there any way to solve this problem when there is no network?
Hi @DuHeLong Please note if you're looking to create an Alexa built-in product using ESP32, you need to use one of the acoustically certified DSPs, as mentioned here. Lyrat- based solutions are not certified and are only for prototyping or creating PoC. If you need further assistance on this you can reach me at my email ID.