text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Support for real-time TTS!

Open czuzu opened this issue 9 months ago • 4 comments

Hello,

I've setup TGUI with the alltalk_tts extension locally, modified the setup to allow for passing LLM replies as they're being generated (stream mode) to the extension, and to subsequently do real-time TTS (aka "incremental" TTS).

PR for the extension is in the backlog too, streaming TTS is working as expected locally, this one is for the parts in TGUI I needed to adjust/extend to allow this to work smoothly.

Mainly, 2 changes were needed:

  1. Add an output_modifier_stream handler for extensions (works only for chat-mode currently) as the enabler for streaming the LLM text to extensions
  2. Do the chat HTML updates structurally and "incrementally" ("diff" mode) - only update what's needed using JS, this was needed because "audio" elements in the chat HTML were previously continuously re-rendered and made audio streaming not possible

(the rest are miscellaneous changes - adding a llama3 instruction template and a commented line to allow remotely debugging TGUI)

Let me know what you think and btw, nice project! Thanks!

Checklist:

czuzu avatar May 08 '24 09:05 czuzu

Please support instruct mode

hypersniper05 avatar May 17 '24 03:05 hypersniper05

@oobabooga please consider this 🙏

hypersniper05 avatar May 17 '24 03:05 hypersniper05

Hey @czuzu Would you consider making this for SillyTavern? Given that you listed only 2 things for it, I thought I'd just ask, if it's no trouble.

bobcate avatar May 25 '24 14:05 bobcate

I gotta check the PR list more often, this is something I've needed for a while. Thank God textgen is open source and I can implement these changes on my own rig. Ty❤️❤️

RandomInternetPreson avatar Aug 30 '24 12:08 RandomInternetPreson