GlaDOS
GlaDOS copied to clipboard
ASR often misses the last spoken word?
Impressive demo! Thanks for sharing the code. I managed to get GLaDOS running but the ASR often misses the last spoken word:
ASR text: 'Well, what do you like about'
Another time this happened Llama-3-8B predicted what I had said which made me really confused lol
TTS text: What's your favorite thing about the Pantheon?
ASR text: 'I really like the'
TTS text: The Pantheon's oculus!
TTS text: It's truly a remarkable feature.
The first question I ask has always been picked up in full which makes me wonder if something is going on with the buffer?
However it could also be that something is wrong with my computer. I am on Linux (PopOS) and using a bluetooth microphone (bluetooth not always reliable on Linux...). Feel free to close this issue if it's just me experiencing this problem.
Haven't tried it yet, but did you experience this problem when you used a wired mic?
I had another issue mentioned on Reddit, where they reported Whisper 'hallucinations'. This makes me think that the choice of microphone is important. I really would hesitate in trying to 'fix' microphone issues in this code base.
Could you try some testing just with the whisper model alone, and see if you have the same issues? The other thing you could try is to iincrease the "PAUSE_LIMIT" parameter to 600 or so.
Thanks for the suggestions, guys! I tried a wired mic and it seemed better; however, I also ran into Pulse Audio-related errors unrelated to this project. Going to close this issue as the problem seems to be with my messed up audio setup.