agents Aggressive transcript mode / text response only mode

Aggressive transcript mode / text response only mode

Open willsmanley opened this issue 5 months ago • 2 comments

I think a common use case is to toggle between voice and text mode (like in the ChatGPT app among others).

If the goal is to create a multimodal framework that can easily toggle between modalities, it would be great to have a way to disable voice synthesis.

Right now, I am just muting the synthesized voice. This is ok for UX, but it is wasting voice synthesis costs and resulting in delayed transcripts. We could deliver the assistant responses to the user much faster if we aggressively stream the transcript rather than waiting for the timing for voice synthesis.

These two issues could be solved together or separately. I am working on how this would be handled in the framework, but would love to hear others' thoughts!

Sep 24 '24 17:09 willsmanley

agents agents copied to clipboard

Aggressive transcript mode / text response only mode

agents
agents copied to clipboard