GlaDOS icon indicating copy to clipboard operation
GlaDOS copied to clipboard

ASR often misses the last spoken word?

Open finetunedforgravitas opened this issue 9 months ago • 2 comments

Impressive demo! Thanks for sharing the code. I managed to get GLaDOS running but the ASR often misses the last spoken word:

ASR text: 'Well, what do you like about'

Another time this happened Llama-3-8B predicted what I had said which made me really confused lol

TTS text:  What's your favorite thing about the Pantheon? 
ASR text: 'I really like the' 
TTS text: The Pantheon's oculus! 
TTS text:  It's truly a remarkable feature.

The first question I ask has always been picked up in full which makes me wonder if something is going on with the buffer?

However it could also be that something is wrong with my computer. I am on Linux (PopOS) and using a bluetooth microphone (bluetooth not always reliable on Linux...). Feel free to close this issue if it's just me experiencing this problem.

finetunedforgravitas avatar May 01 '24 14:05 finetunedforgravitas

Haven't tried it yet, but did you experience this problem when you used a wired mic?

chozillla avatar May 02 '24 06:05 chozillla

I had another issue mentioned on Reddit, where they reported Whisper 'hallucinations'. This makes me think that the choice of microphone is important. I really would hesitate in trying to 'fix' microphone issues in this code base.

Could you try some testing just with the whisper model alone, and see if you have the same issues? The other thing you could try is to iincrease the "PAUSE_LIMIT" parameter to 600 or so.

dnhkng avatar May 02 '24 09:05 dnhkng

Thanks for the suggestions, guys! I tried a wired mic and it seemed better; however, I also ran into Pulse Audio-related errors unrelated to this project. Going to close this issue as the problem seems to be with my messed up audio setup.

finetunedforgravitas avatar May 03 '24 03:05 finetunedforgravitas