Fakhir Ali
Fakhir Ali
and listen_stream vs listen. If one is implemented that one should be used.
https://github.com/Finity-Alpha/OpenVoiceChat/blob/764a5bf57b524cfbd2eb84a1197126013420d405/openvoicechat/tts/base.py#L129 Here the llm queue may have multiple tokens but the processing(sentence split etc) would be done for every token. Ideally it is done of all of the tokens that...
The elevenlabs streaming latency is set to 4. There should be a param to change it. https://github.com/Finity-Alpha/OpenVoiceChat/blob/513b1d014876bb3e2909b3fd1044c352b2729760/openvoicechat/tts/tts_elevenlabs.py#L28
This would further reduce perceived latency,
Send the LLM request before the silence is completely detected. For example if the silence seconds is 2 s, send an LLM request with all the available transcription after 1s...
Repeated complex jumbled code.
https://www.twilio.com/docs/voice/media-streams/websocket-messages#send-a-mark-message Is the way twilio synchronizes the audio pipeline.
Where should the buffering happen? When it is on device, buffering happens in BaseMouth. When it is on web, it should happen at the client. When it is on call...
There should be an audio handler that handles audio input and output. Having separate listener and player don't make sense. Also listening, listening for interruption should be states instead of...