firmware ESP-S3-BOX-3: Unexpected error during wake-word-detection

I flashed my ESP-S3-Box-3 by using the esphome.io website. I had connected the box to HA. The waiting house symbol appears on the screen. There is a little snap sound after the startup was successful. After saying "ok nabu" the orange house (error) appears and no more response is possible. I'm using piper, openwakeword and whisper as local addons in my HA supervised installation.

ESPHOME shows the following output:

[22:06:00][D][api.connection:1089]: Home Assistant 2023.12.3 (xx.xx.xx.xx): Connected successfully
[22:06:00][W][component:214]: Component api took a long time for an operation (0.28 s).
[22:06:00][W][component:215]: Components should block for at most 20-30ms.
[22:06:01][D][voice_assistant:422]: State changed from IDLE to START_MICROPHONE
[22:06:01][D][voice_assistant:428]: Desired state set to WAIT_FOR_VAD
[22:06:01][W][component:214]: Component api took a long time for an operation (0.28 s).
[22:06:01][W][component:215]: Components should block for at most 20-30ms.
[22:06:01][D][voice_assistant:159]: Starting Microphone
[22:06:01][D][voice_assistant:422]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[22:06:01][D][esp-idf:000]: I (25378) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[22:06:01][D][esp-idf:000]: I (25384) I2S: I2S0, MCLK output by GPIO2

[22:06:01][D][esp-idf:000]: I (25390) AUDIO_PIPELINE: link el->rb, el:0x3d036c54, tag:i2s, rb:0x3d037068

[22:06:01][D][esp-idf:000]: I (25393) AUDIO_PIPELINE: link el->rb, el:0x3d036dc8, tag:filter, rb:0x3d0390a8

[22:06:01][D][esp-idf:000]: I (25399) AUDIO_ELEMENT: [i2s-0x3d036c54] Element task created

[22:06:01][D][esp-idf:000]: I (25402) AUDIO_THREAD: The filter task allocate stack on external memory

[22:06:01][D][esp-idf:000]: I (25409) AUDIO_ELEMENT: [filter-0x3d036dc8] Element task created

[22:06:01][D][esp-idf:000]: I (25415) AUDIO_ELEMENT: [raw-0x3d036ef8] Element task created

[22:06:01][D][esp-idf:000]: I (25419) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16586299 Bytes, Inter:50964 Bytes, Dram:50964 Bytes


[22:06:01][D][esp-idf:000]: I (25422) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[22:06:01][D][esp-idf:000]: I (25425) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1

[22:06:01][D][esp-idf:000]: I (25429) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1

[22:06:01][D][esp-idf:000]: I (25432) AUDIO_PIPELINE: Pipeline started

[22:06:01][D][esp_adf.microphone:294]: Microphone started
[22:06:01][D][voice_assistant:422]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[22:06:01][D][voice_assistant:176]: Waiting for speech...
[22:06:01][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[22:06:06][D][voice_assistant:189]: VAD detected speech
[22:06:06][D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[22:06:06][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[22:06:06][D][voice_assistant:206]: Requesting start...
[22:06:06][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[22:06:06][D][voice_assistant:443]: Client started, streaming microphone
[22:06:06][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[22:06:06][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[22:06:06][D][voice_assistant:529]: Event Type: 0
[22:06:06][E][voice_assistant:656]: Error: no_wake_word - No wake word detected
[22:06:06][D][voice_assistant:522]: Signaling stop...
[22:06:06][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[22:06:06][D][voice_assistant:428]: Desired state set to IDLE
[22:06:06][D][esp_adf.microphone:256]: Stopping microphone
[22:06:06][D][voice_assistant:422]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[22:06:06][D][esp-idf:000]: W (30571) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT

[22:06:06][D][esp-idf:000]: E (30575) AUDIO_ELEMENT: [filter] Element already stopped

[22:06:06][D][esp-idf:000]: W (30603) AUDIO_PIPELINE: There are no listener registered

[22:06:06][D][esp-idf:000]: I (30607) AUDIO_PIPELINE: audio_pipeline_unlinked

[22:06:06][D][esp-idf:000]: W (30612) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:06:06][D][esp-idf:000]: I (30617) I2S: DMA queue destroyed

[22:06:06][D][esp-idf:000]: W (30624) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:06:07][D][esp-idf:000]: W (30629) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:06:07][W][component:214]: Component voice_assistant took a long time for an operation (0.28 s).
[22:06:07][W][component:215]: Components should block for at most 20-30ms.
[22:06:07][D][voice_assistant:529]: Event Type: 2
[22:06:07][D][voice_assistant:619]: Assist Pipeline ended
[22:06:07][D][voice_assistant:529]: Event Type: 1
[22:06:07][D][voice_assistant:532]: Assist Pipeline running
[22:06:07][D][voice_assistant:529]: Event Type: 9
[22:06:07][D][esp_adf.microphone:306]: Microphone stopped
[22:06:07][D][voice_assistant:529]: Event Type: 0
[22:06:07][E][voice_assistant:656]: Error: wake-stream-failed - Unexpected error during wake-word-detection
[22:06:07][D][voice_assistant:522]: Signaling stop...
[22:06:07][D][voice_assistant:422]: State changed from STOPPING_MICROPHONE to STOP_MICROPHONE
[22:06:07][D][voice_assistant:428]: Desired state set to IDLE
[22:06:07][D][voice_assistant:422]: State changed from STOP_MICROPHONE to IDLE
[22:06:07][W][component:214]: Component voice_assistant took a long time for an operation (0.27 s).
[22:06:07][W][component:215]: Components should block for at most 20-30ms.
[22:06:07][D][voice_assistant:529]: Event Type: 2
[22:06:07][D][voice_assistant:619]: Assist Pipeline ended
[22:06:07][D][voice_assistant:422]: State changed from IDLE to START_MICROPHONE
[22:06:07][D][voice_assistant:428]: Desired state set to WAIT_FOR_VAD
[22:06:07][D][voice_assistant:159]: Starting Microphone
[22:06:07][D][voice_assistant:422]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[22:06:07][D][esp-idf:000]: I (31287) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[22:06:07][D][esp-idf:000]: I (31292) I2S: I2S0, MCLK output by GPIO2

[22:06:07][D][esp-idf:000]: I (31299) AUDIO_PIPELINE: link el->rb, el:0x3d036c54, tag:i2s, rb:0x3d037068

[22:06:07][D][esp-idf:000]: I (31303) AUDIO_PIPELINE: link el->rb, el:0x3d036dc8, tag:filter, rb:0x3d0390a8

[22:06:07][D][esp-idf:000]: I (31310) AUDIO_ELEMENT: [i2s-0x3d036c54] Element task created

[22:06:07][D][esp-idf:000]: I (31313) AUDIO_THREAD: The filter task allocate stack on external memory

[22:06:07][D][esp-idf:000]: I (31318) AUDIO_ELEMENT: [filter-0x3d036dc8] Element task created

[22:06:07][D][esp-idf:000]: I (31323) AUDIO_ELEMENT: [raw-0x3d036ef8] Element task created

[22:06:07][D][esp-idf:000]: I (31326) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16586747 Bytes, Inter:51412 Bytes, Dram:51412 Bytes


[22:06:07][D][esp-idf:000]: I (31330) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[22:06:07][D][esp-idf:000]: I (31332) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1

[22:06:07][D][esp-idf:000]: I (31336) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1

[22:06:07][D][esp-idf:000]: I (31339) AUDIO_PIPELINE: Pipeline started

[22:06:07][D][esp_adf.microphone:294]: Microphone started
[22:06:07][D][voice_assistant:422]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[22:06:07][D][voice_assistant:176]: Waiting for speech...
[22:06:07][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[22:06:08][W][component:214]: Component voice_assistant took a long time for an operation (0.27 s).
[22:06:08][W][component:215]: Components should block for at most 20-30ms.
[22:06:08][W][component:214]: Component voice_assistant took a long time for an operation (0.27 s).
[22:06:08][W][component:215]: Components should block for at most 20-30ms.

HA assist debug:

stage: done
run:
  pipeline: 01hhz9tr3494bh7s6fvc967ncw
  language: en
events:
  - type: run-start
    data:
      pipeline: 01hhz9tr3494bh7s6fvc967ncw
      language: en
    timestamp: "2023-12-18T21:06:06.836622+00:00"
  - type: wake_word-start
    data:
      entity_id: wake_word.openwakeword
      metadata:
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
      timeout: 5
    timestamp: "2023-12-18T21:06:06.837100+00:00"
  - type: error
    data:
      code: wake-stream-failed
      message: Unexpected error during wake-word-detection
    timestamp: "2023-12-18T21:06:06.847630+00:00"
  - type: run-end
    data: null
    timestamp: "2023-12-18T21:06:06.848318+00:00"
wake_word:
  entity_id: wake_word.openwakeword
  metadata:
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  timeout: 5
  done: false
error:
  code: wake-stream-failed
  message: Unexpected error during wake-word-detection

Dec 18 '23 21:12 sti0

Very similar issue here. The same errors from ESP side:

` [D][esp-idf:000]: I (14121) AUDIO_PIPELINE: Pipeline started [D][esp_adf.microphone:264]: Microphone started [D][voice_assistant:410]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD [D][voice_assistant:170]: Waiting for speech... [D][voice_assistant:410]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD [D][voice_assistant:183]: VAD detected speech [D][voice_assistant:410]: State changed from WAITING_FOR_VAD to START_PIPELINE [D][voice_assistant:416]: Desired state set to STREAMING_MICROPHONE [D][voice_assistant:200]: Requesting start... [D][voice_assistant:410]: State changed from START_PIPELINE to STARTING_PIPELINE [D][voice_assistant:431]: Client started, streaming microphone [D][voice_assistant:410]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [D][voice_assistant:416]: Desired state set to STREAMING_MICROPHONE [D][voice_assistant:517]: Event Type: 0 [E][voice_assistant:644]: Error: no_wake_word - No wake word detected [D][voice_assistant:510]: Signaling stop... [D][voice_assistant:410]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE [D][voice_assistant:416]: Desired state set to IDLE [D][esp_adf.microphone:226]: Stopping microphone [D][voice_assistant:410]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [D][esp-idf:000]: W (72547) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT

[D][esp-idf:000]: E (72549) AUDIO_ELEMENT: [filter] Element already stopped

[D][esp-idf:000]: W (72580) AUDIO_PIPELINE: There are no listener registered

[D][esp-idf:000]: I (72583) AUDIO_PIPELINE: audio_pipeline_unlinked

[D][esp-idf:000]: W (72585) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[D][esp-idf:000]: I (72587) I2S: DMA queue destroyed

[D][esp-idf:000]: W (72589) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[D][esp-idf:000]: W (72593) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[W][component:214]: Component voice_assistant took a long time for an operation (0.21 s). [W][component:215]: Components should block for at most 20-30ms. [D][voice_assistant:517]: Event Type: 2 [D][voice_assistant:607]: Assist Pipeline ended [D][voice_assistant:517]: Event Type: 1 [D][voice_assistant:520]: Assist Pipeline running [D][esp_adf.microphone:276]: Microphone stopped [D][voice_assistant:517]: Event Type: 9 [D][voice_assistant:410]: State changed from STOPPING_MICROPHONE to IDLE [D][voice_assistant:517]: Event Type: 0 [E][voice_assistant:644]: Error: wake-stream-failed - Unexpected error during wake-word-detection [D][voice_assistant:410]: State changed from IDLE to START_MICROPHONE [D][voice_assistant:416]: Desired state set to WAIT_FOR_VAD [W][component:214]: Component voice_assistant took a long time for an operation (0.21 s). [W][component:215]: Components should block for at most 20-30ms. [D][voice_assistant:517]: Event Type: 2 [D][voice_assistant:607]: Assist Pipeline ended [D][voice_assistant:153]: Starting Microphone [D][voice_assistant:410]: State changed from START_MICROPHONE to STARTING_MICROPHONE [D][esp-idf:000]: I (72987) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[D][esp-idf:000]: I (72992) I2S: I2S0, MCLK output by GPIO2 [D][esp-idf:000]: I (72996) AUDIO_PIPELINE: link el->rb, el:0x3d036c2c, tag:i2s, rb:0x3d037040

[D][esp-idf:000]: I (72998) AUDIO_PIPELINE: link el->rb, el:0x3d036da0, tag:filter, rb:0x3d039080 [D][esp-idf:000]: I (73003) AUDIO_ELEMENT: [i2s-0x3d036c2c] Element task created0m [D][esp-idf:000]: I (73005) AUDIO_THREAD: The filter task allocate stack on external memory

[D][esp-idf:000]: I (73008) AUDIO_ELEMENT: [filter-0x3d036da0] Element task created

[D][esp-idf:000]: I (73010) AUDIO_ELEMENT: [raw-0x3d036ed0] Element task created `

Did you manage to resolve it somehow?

Jan 18 '24 16:01 mackowskim

Sadly not, its still not working

Jan 20 '24 21:01 sti0

Issue still exists after upgrading to ESPHome 2023.12.9. Doesn't make a difference if I use openwakeword or porcupine1. Both times its crashing after saying something (even if its not the wake word)....

Jan 23 '24 20:01 sti0

Exactly the same symptoms here. What's more, I was able to reinstall original firmware and it reacts to "OK E.S.P." with no issues. Also understands the commands, to it's not a hardware issue.

Jan 26 '24 10:01 mackowskim

After switching to local wake word the detection works like expected. https://github.com/esphome/firmware/tree/main/wake-word-voice-assistant

Mar 02 '24 22:03 sti0

When switching to wake word engine location from "on device" to "in Home Assistant" the error still exists. So re-opening this ticket because its solved when using local wake word detection but one may still use the Home Assistant detection...

Mar 02 '24 23:03 sti0

firmware firmware copied to clipboard

ESP-S3-BOX-3: Unexpected error during wake-word-detection

firmware
firmware copied to clipboard