rhasspy
rhasspy copied to clipboard
No intent recognized since v2.5.10
Hello,
I updated from v2.5.7 to latest v2.5.10. After I updated, Intents from STT couldn't get recognized anymore. If I type in the words in Rhasspy GUI it works. I had a look on the logs and found this:
Jul 05 21:26:28 raspberrypi rhasspy[20147]: [DEBUG:2021-07-05 21:26:28,935] rhasspydialogue_hermes: -> NluIntentNotRecognized(input='schalte das garagenlicht aus', site_id='default', id=None, custom_data=None, session_id='default-default-3ff18ae4-4dfa-4d4b-a60a-8f5cd7250985')
The words were recognized correctly, but the Intent Recognition is not working anymore. I tried v2.5.9 and in this version it still works fine.
Here is the complete log: rhasspy_logs.txt
Do you have any hints for me what could be wrong? If you need more information, please let me know.
I can confirm that. Some times the result from speech to text that is also shown in rhasspy page, doesn't get recognised. If I press the button Recognise (without typing anything) then it works!
I have also some logs:
[DEBUG:2021-08-27 21:51:05,624] rhasspyasr_pocketsphinx_hermes: Publishing 1302 bytes(s) to hermes/asr/textCaptured
[DEBUG:2021-08-27 21:51:05,624] rhasspyasr_pocketsphinx_hermes: -> AsrAudioCaptured(97004 byte(s)) to rhasspy/asr/RpiWohn/RpiWohn/audioCaptured
[DEBUG:2021-08-27 21:51:05,630] rhasspydialogue_hermes: <- AsrTextCaptured(text='πότε θα βρέξει', likelihood=0.09395173083272149, seconds=0.7849854379892349, site_id='RpiWohn', session_id='RpiWohn-picovoice_linux-84c85aa0-212d-47e8-b249-b3b349cf1fc4', wakeword_id=None, asr_tokens=[[AsrToken(value='<s>', confidence=0.9998999934083855, range_start=0, range_end=4, time=AsrTokenTime(start=0.0, end=0.06)), AsrToken(value='πότε', confidence=0.9269007371746152, range_start=4, range_end=9, time=AsrTokenTime(start=0.07, end=0.37)), AsrToken(value='<sil>', confidence=0.33282863912399174, range_start=9, range_end=15, time=AsrTokenTime(start=0.38, end=0.4)), AsrToken(value='θα', confidence=0.5445091894643684, range_start=15, range_end=18, time=AsrTokenTime(start=0.41, end=0.54)), AsrToken(value='βρέξει', confidence=0.9998999934083855, range_start=18, range_end=25, time=AsrTokenTime(start=0.55, end=1.25)), AsrToken(value='<sil>', confidence=1.0, range_start=25, range_end=31, time=AsrTokenTime(start=1.26, end=1.53)), AsrToken(value='<sil>', confidence=1.0, range_start=31, range_end=37, time=AsrTokenTime(start=1.54, end=2.0)), AsrToken(value='<sil>', confidence=1.0, range_start=37, range_end=43, time=AsrTokenTime(start=2.01, end=2.24)), AsrToken(value='</s>', confidence=1.0, range_start=43, range_end=48, time=AsrTokenTime(start=2.25, end=2.37))]], lang=None)
[DEBUG:2021-08-27 21:51:05,630] rhasspydialogue_hermes: Playing sound /usr/lib/rhasspy/etc/wav/beep_lo.wav
[DEBUG:2021-08-27 21:51:05,631] rhasspydialogue_hermes: -> HotwordToggleOff(site_id='RpiWohn', reason=<HotwordToggleReason.PLAY_AUDIO: 'playAudio'>)
[DEBUG:2021-08-27 21:51:05,631] rhasspydialogue_hermes: Publishing 44 bytes(s) to hermes/hotword/toggleOff
[DEBUG:2021-08-27 21:51:05,632] rhasspydialogue_hermes: -> AsrToggleOff(site_id='RpiWohn', reason=<AsrToggleReason.PLAY_AUDIO: 'playAudio'>)
[DEBUG:2021-08-27 21:51:05,632] rhasspydialogue_hermes: Publishing 44 bytes(s) to hermes/asr/toggleOff
[DEBUG:2021-08-27 21:51:05,632] rhasspydialogue_hermes: -> AudioPlayBytes(119908 byte(s)) to hermes/audioServer/RpiWohn/playBytes/44e6c78c-c107-4d70-9c53-60a5f85d428b
[DEBUG:2021-08-27 21:51:05,633] rhasspydialogue_hermes: Waiting for playFinished (id=44e6c78c-c107-4d70-9c53-60a5f85d428b, timeout=1.6090022675736961)
[DEBUG:2021-08-27 21:51:05,633] rhasspywake_porcupine_hermes: <- HotwordToggleOff(site_id='RpiWohn', reason=<HotwordToggleReason.PLAY_AUDIO: 'playAudio'>)
[DEBUG:2021-08-27 21:51:05,633] rhasspyasr_pocketsphinx_hermes: <- AsrToggleOff(site_id='RpiWohn', reason=<AsrToggleReason.PLAY_AUDIO: 'playAudio'>)
[DEBUG:2021-08-27 21:51:05,633] rhasspyasr_pocketsphinx_hermes: Disabled
[DEBUG:2021-08-27 21:51:05,633] rhasspywake_porcupine_hermes: Disabled
[DEBUG:2021-08-27 21:51:06,472] rhasspytts_cli_hermes: <- AudioPlayFinished(id='44e6c78c-c107-4d70-9c53-60a5f85d428b', session_id='44e6c78c-c107-4d70-9c53-60a5f85d428b')
[DEBUG:2021-08-27 21:51:06,472] rhasspydialogue_hermes: <- AudioPlayFinished(id='44e6c78c-c107-4d70-9c53-60a5f85d428b', session_id='44e6c78c-c107-4d70-9c53-60a5f85d428b')
[DEBUG:2021-08-27 21:51:06,473] rhasspydialogue_hermes: -> HotwordToggleOn(site_id='RpiWohn', reason=<HotwordToggleReason.PLAY_AUDIO: 'playAudio'>)
[DEBUG:2021-08-27 21:51:06,473] rhasspydialogue_hermes: Publishing 44 bytes(s) to hermes/hotword/toggleOn
[DEBUG:2021-08-27 21:51:06,474] rhasspydialogue_hermes: -> AsrToggleOn(site_id='RpiWohn', reason=<AsrToggleReason.PLAY_AUDIO: 'playAudio'>)
[DEBUG:2021-08-27 21:51:06,474] rhasspydialogue_hermes: Publishing 44 bytes(s) to hermes/asr/toggleOn
[DEBUG:2021-08-27 21:51:06,474] rhasspydialogue_hermes: Received text: πότε θα βρέξει
[DEBUG:2021-08-27 21:51:06,474] rhasspydialogue_hermes: -> AsrStopListening(site_id='RpiWohn', session_id='RpiWohn-picovoice_linux-84c85aa0-212d-47e8-b249-b3b349cf1fc4')
[DEBUG:2021-08-27 21:51:06,474] rhasspydialogue_hermes: Publishing 98 bytes(s) to hermes/asr/stopListening
[DEBUG:2021-08-27 21:51:06,475] rhasspydialogue_hermes: -> HotwordToggleOn(site_id='RpiWohn', reason=<HotwordToggleReason.DIALOGUE_SESSION: 'dialogueSession'>)
[DEBUG:2021-08-27 21:51:06,475] rhasspydialogue_hermes: Publishing 50 bytes(s) to hermes/hotword/toggleOn
[DEBUG:2021-08-27 21:51:06,475] rhasspydialogue_hermes: Transcription is below confidence threshold (0.09395173083272149 < 0.1): πότε θα βρέξει
[DEBUG:2021-08-27 21:51:06,475] rhasspydialogue_hermes: -> NluIntentNotRecognized(input='πότε θα βρέξει', site_id='RpiWohn', id=None, custom_data=None, session_id='RpiWohn-picovoice_linux-84c85aa0-212d-47e8-b249-b3b349cf1fc4')
[DEBUG:2021-08-27 21:51:06,476] rhasspydialogue_hermes: Publishing 157 bytes(s) to hermes/nlu/intentNotRecognized
What's important:
We see that the Wav is transcribed with confidence from 0.5 to 0.99 for every word:
AsrTextCaptured(text='πότε θα βρέξει', likelihood=0.09395173083272149, seconds=0.7849854379892349, site_id='RpiWohn', session_id='RpiWohn-picovoice_linux-84c85aa0-212d-47e8-b249-b3b349cf1fc4', wakeword_id=None, asr_tokens=[[AsrToken(value='<s>', confidence=0.9998999934083855, range_start=0, range_end=4, time=AsrTokenTime(start=0.0, end=0.06)), AsrToken(value='πότε', confidence=0.9269007371746152, range_start=4, range_end=9, time=AsrTokenTime(start=0.07, end=0.37)), AsrToken(value='<sil>', confidence=0.33282863912399174, range_start=9, range_end=15, time=AsrTokenTime(start=0.38, end=0.4)), AsrToken(value='θα', confidence=0.5445091894643684, range_start=15, range_end=18, time=AsrTokenTime(start=0.41, end=0.54)), AsrToken(value='βρέξει', confidence=0.9998999934083855
Is there the bug hidden? -> "Transcription is below confidence threshold (0.09395173083272149 < 0.1)"
Lowering the minimum confidence from Pocketsphinx to 0 skips the problem...
Maybe the <s>
and <sil>
shouldn't be there, or not considered as part of the transcribed text. I just spoken exactly what is transcribed 'πότε θα βρέξει' and that's also a trained sentence...
Sorry, I only try to help :D
We are now on 2.5.11, is this still an issue?