firmware
firmware copied to clipboard
ESP-S3-BOX-3: Unexpected error during wake-word-detection
I flashed my ESP-S3-Box-3 by using the esphome.io website. I had connected the box to HA. The waiting house symbol appears on the screen. There is a little snap sound after the startup was successful. After saying "ok nabu" the orange house (error) appears and no more response is possible. I'm using piper, openwakeword and whisper as local addons in my HA supervised installation.
ESPHOME shows the following output:
[22:06:00][D][api.connection:1089]: Home Assistant 2023.12.3 (xx.xx.xx.xx): Connected successfully
[22:06:00][W][component:214]: Component api took a long time for an operation (0.28 s).
[22:06:00][W][component:215]: Components should block for at most 20-30ms.
[22:06:01][D][voice_assistant:422]: State changed from IDLE to START_MICROPHONE
[22:06:01][D][voice_assistant:428]: Desired state set to WAIT_FOR_VAD
[22:06:01][W][component:214]: Component api took a long time for an operation (0.28 s).
[22:06:01][W][component:215]: Components should block for at most 20-30ms.
[22:06:01][D][voice_assistant:159]: Starting Microphone
[22:06:01][D][voice_assistant:422]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[22:06:01][D][esp-idf:000]: I (25378) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[22:06:01][D][esp-idf:000]: I (25384) I2S: I2S0, MCLK output by GPIO2
[22:06:01][D][esp-idf:000]: I (25390) AUDIO_PIPELINE: link el->rb, el:0x3d036c54, tag:i2s, rb:0x3d037068
[22:06:01][D][esp-idf:000]: I (25393) AUDIO_PIPELINE: link el->rb, el:0x3d036dc8, tag:filter, rb:0x3d0390a8
[22:06:01][D][esp-idf:000]: I (25399) AUDIO_ELEMENT: [i2s-0x3d036c54] Element task created
[22:06:01][D][esp-idf:000]: I (25402) AUDIO_THREAD: The filter task allocate stack on external memory
[22:06:01][D][esp-idf:000]: I (25409) AUDIO_ELEMENT: [filter-0x3d036dc8] Element task created
[22:06:01][D][esp-idf:000]: I (25415) AUDIO_ELEMENT: [raw-0x3d036ef8] Element task created
[22:06:01][D][esp-idf:000]: I (25419) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16586299 Bytes, Inter:50964 Bytes, Dram:50964 Bytes
[22:06:01][D][esp-idf:000]: I (25422) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[22:06:01][D][esp-idf:000]: I (25425) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1
[22:06:01][D][esp-idf:000]: I (25429) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1
[22:06:01][D][esp-idf:000]: I (25432) AUDIO_PIPELINE: Pipeline started
[22:06:01][D][esp_adf.microphone:294]: Microphone started
[22:06:01][D][voice_assistant:422]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[22:06:01][D][voice_assistant:176]: Waiting for speech...
[22:06:01][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[22:06:06][D][voice_assistant:189]: VAD detected speech
[22:06:06][D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[22:06:06][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[22:06:06][D][voice_assistant:206]: Requesting start...
[22:06:06][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[22:06:06][D][voice_assistant:443]: Client started, streaming microphone
[22:06:06][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[22:06:06][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[22:06:06][D][voice_assistant:529]: Event Type: 0
[22:06:06][E][voice_assistant:656]: Error: no_wake_word - No wake word detected
[22:06:06][D][voice_assistant:522]: Signaling stop...
[22:06:06][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[22:06:06][D][voice_assistant:428]: Desired state set to IDLE
[22:06:06][D][esp_adf.microphone:256]: Stopping microphone
[22:06:06][D][voice_assistant:422]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[22:06:06][D][esp-idf:000]: W (30571) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT
[22:06:06][D][esp-idf:000]: E (30575) AUDIO_ELEMENT: [filter] Element already stopped
[22:06:06][D][esp-idf:000]: W (30603) AUDIO_PIPELINE: There are no listener registered
[22:06:06][D][esp-idf:000]: I (30607) AUDIO_PIPELINE: audio_pipeline_unlinked
[22:06:06][D][esp-idf:000]: W (30612) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE
[22:06:06][D][esp-idf:000]: I (30617) I2S: DMA queue destroyed
[22:06:06][D][esp-idf:000]: W (30624) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE
[22:06:07][D][esp-idf:000]: W (30629) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE
[22:06:07][W][component:214]: Component voice_assistant took a long time for an operation (0.28 s).
[22:06:07][W][component:215]: Components should block for at most 20-30ms.
[22:06:07][D][voice_assistant:529]: Event Type: 2
[22:06:07][D][voice_assistant:619]: Assist Pipeline ended
[22:06:07][D][voice_assistant:529]: Event Type: 1
[22:06:07][D][voice_assistant:532]: Assist Pipeline running
[22:06:07][D][voice_assistant:529]: Event Type: 9
[22:06:07][D][esp_adf.microphone:306]: Microphone stopped
[22:06:07][D][voice_assistant:529]: Event Type: 0
[22:06:07][E][voice_assistant:656]: Error: wake-stream-failed - Unexpected error during wake-word-detection
[22:06:07][D][voice_assistant:522]: Signaling stop...
[22:06:07][D][voice_assistant:422]: State changed from STOPPING_MICROPHONE to STOP_MICROPHONE
[22:06:07][D][voice_assistant:428]: Desired state set to IDLE
[22:06:07][D][voice_assistant:422]: State changed from STOP_MICROPHONE to IDLE
[22:06:07][W][component:214]: Component voice_assistant took a long time for an operation (0.27 s).
[22:06:07][W][component:215]: Components should block for at most 20-30ms.
[22:06:07][D][voice_assistant:529]: Event Type: 2
[22:06:07][D][voice_assistant:619]: Assist Pipeline ended
[22:06:07][D][voice_assistant:422]: State changed from IDLE to START_MICROPHONE
[22:06:07][D][voice_assistant:428]: Desired state set to WAIT_FOR_VAD
[22:06:07][D][voice_assistant:159]: Starting Microphone
[22:06:07][D][voice_assistant:422]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[22:06:07][D][esp-idf:000]: I (31287) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[22:06:07][D][esp-idf:000]: I (31292) I2S: I2S0, MCLK output by GPIO2
[22:06:07][D][esp-idf:000]: I (31299) AUDIO_PIPELINE: link el->rb, el:0x3d036c54, tag:i2s, rb:0x3d037068
[22:06:07][D][esp-idf:000]: I (31303) AUDIO_PIPELINE: link el->rb, el:0x3d036dc8, tag:filter, rb:0x3d0390a8
[22:06:07][D][esp-idf:000]: I (31310) AUDIO_ELEMENT: [i2s-0x3d036c54] Element task created
[22:06:07][D][esp-idf:000]: I (31313) AUDIO_THREAD: The filter task allocate stack on external memory
[22:06:07][D][esp-idf:000]: I (31318) AUDIO_ELEMENT: [filter-0x3d036dc8] Element task created
[22:06:07][D][esp-idf:000]: I (31323) AUDIO_ELEMENT: [raw-0x3d036ef8] Element task created
[22:06:07][D][esp-idf:000]: I (31326) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16586747 Bytes, Inter:51412 Bytes, Dram:51412 Bytes
[22:06:07][D][esp-idf:000]: I (31330) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[22:06:07][D][esp-idf:000]: I (31332) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1
[22:06:07][D][esp-idf:000]: I (31336) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1
[22:06:07][D][esp-idf:000]: I (31339) AUDIO_PIPELINE: Pipeline started
[22:06:07][D][esp_adf.microphone:294]: Microphone started
[22:06:07][D][voice_assistant:422]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[22:06:07][D][voice_assistant:176]: Waiting for speech...
[22:06:07][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[22:06:08][W][component:214]: Component voice_assistant took a long time for an operation (0.27 s).
[22:06:08][W][component:215]: Components should block for at most 20-30ms.
[22:06:08][W][component:214]: Component voice_assistant took a long time for an operation (0.27 s).
[22:06:08][W][component:215]: Components should block for at most 20-30ms.
HA assist debug:
stage: done
run:
pipeline: 01hhz9tr3494bh7s6fvc967ncw
language: en
events:
- type: run-start
data:
pipeline: 01hhz9tr3494bh7s6fvc967ncw
language: en
timestamp: "2023-12-18T21:06:06.836622+00:00"
- type: wake_word-start
data:
entity_id: wake_word.openwakeword
metadata:
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
timeout: 5
timestamp: "2023-12-18T21:06:06.837100+00:00"
- type: error
data:
code: wake-stream-failed
message: Unexpected error during wake-word-detection
timestamp: "2023-12-18T21:06:06.847630+00:00"
- type: run-end
data: null
timestamp: "2023-12-18T21:06:06.848318+00:00"
wake_word:
entity_id: wake_word.openwakeword
metadata:
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
timeout: 5
done: false
error:
code: wake-stream-failed
message: Unexpected error during wake-word-detection
Very similar issue here. The same errors from ESP side:
` [D][esp-idf:000]: I (14121) AUDIO_PIPELINE: Pipeline started [D][esp_adf.microphone:264]: Microphone started [D][voice_assistant:410]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD [D][voice_assistant:170]: Waiting for speech... [D][voice_assistant:410]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD [D][voice_assistant:183]: VAD detected speech [D][voice_assistant:410]: State changed from WAITING_FOR_VAD to START_PIPELINE [D][voice_assistant:416]: Desired state set to STREAMING_MICROPHONE [D][voice_assistant:200]: Requesting start... [D][voice_assistant:410]: State changed from START_PIPELINE to STARTING_PIPELINE [D][voice_assistant:431]: Client started, streaming microphone [D][voice_assistant:410]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [D][voice_assistant:416]: Desired state set to STREAMING_MICROPHONE [D][voice_assistant:517]: Event Type: 0 [E][voice_assistant:644]: Error: no_wake_word - No wake word detected [D][voice_assistant:510]: Signaling stop... [D][voice_assistant:410]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE [D][voice_assistant:416]: Desired state set to IDLE [D][esp_adf.microphone:226]: Stopping microphone [D][voice_assistant:410]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [D][esp-idf:000]: W (72547) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT
[D][esp-idf:000]: E (72549) AUDIO_ELEMENT: [filter] Element already stopped
[D][esp-idf:000]: W (72580) AUDIO_PIPELINE: There are no listener registered
[D][esp-idf:000]: I (72583) AUDIO_PIPELINE: audio_pipeline_unlinked
[D][esp-idf:000]: W (72585) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE
[D][esp-idf:000]: I (72587) I2S: DMA queue destroyed
[D][esp-idf:000]: W (72589) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE
[D][esp-idf:000]: W (72593) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE
[W][component:214]: Component voice_assistant took a long time for an operation (0.21 s). [W][component:215]: Components should block for at most 20-30ms. [D][voice_assistant:517]: Event Type: 2 [D][voice_assistant:607]: Assist Pipeline ended [D][voice_assistant:517]: Event Type: 1 [D][voice_assistant:520]: Assist Pipeline running [D][esp_adf.microphone:276]: Microphone stopped [D][voice_assistant:517]: Event Type: 9 [D][voice_assistant:410]: State changed from STOPPING_MICROPHONE to IDLE [D][voice_assistant:517]: Event Type: 0 [E][voice_assistant:644]: Error: wake-stream-failed - Unexpected error during wake-word-detection [D][voice_assistant:410]: State changed from IDLE to START_MICROPHONE [D][voice_assistant:416]: Desired state set to WAIT_FOR_VAD [W][component:214]: Component voice_assistant took a long time for an operation (0.21 s). [W][component:215]: Components should block for at most 20-30ms. [D][voice_assistant:517]: Event Type: 2 [D][voice_assistant:607]: Assist Pipeline ended [D][voice_assistant:153]: Starting Microphone [D][voice_assistant:410]: State changed from START_MICROPHONE to STARTING_MICROPHONE [D][esp-idf:000]: I (72987) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[D][esp-idf:000]: I (72992) I2S: I2S0, MCLK output by GPIO2 [D][esp-idf:000]: I (72996) AUDIO_PIPELINE: link el->rb, el:0x3d036c2c, tag:i2s, rb:0x3d037040
[D][esp-idf:000]: I (72998) AUDIO_PIPELINE: link el->rb, el:0x3d036da0, tag:filter, rb:0x3d039080 [D][esp-idf:000]: I (73003) AUDIO_ELEMENT: [i2s-0x3d036c2c] Element task created0m [D][esp-idf:000]: I (73005) AUDIO_THREAD: The filter task allocate stack on external memory
[D][esp-idf:000]: I (73008) AUDIO_ELEMENT: [filter-0x3d036da0] Element task created
[D][esp-idf:000]: I (73010) AUDIO_ELEMENT: [raw-0x3d036ed0] Element task created `
Did you manage to resolve it somehow?
Sadly not, its still not working
Issue still exists after upgrading to ESPHome 2023.12.9. Doesn't make a difference if I use openwakeword or porcupine1. Both times its crashing after saying something (even if its not the wake word)....
Exactly the same symptoms here. What's more, I was able to reinstall original firmware and it reacts to "OK E.S.P." with no issues. Also understands the commands, to it's not a hardware issue.
After switching to local wake word the detection works like expected. https://github.com/esphome/firmware/tree/main/wake-word-voice-assistant
When switching to wake word engine location from "on device" to "in Home Assistant" the error still exists. So re-opening this ticket because its solved when using local wake word detection but one may still use the Home Assistant detection...