rhubarb-lip-sync icon indicating copy to clipboard operation
rhubarb-lip-sync copied to clipboard

Real time animation for tts output

Open sway4em opened this issue 1 year ago • 1 comments

Hi, I'm working on an app where an llm response is converted into speech using tts and the audio is played alongside an animation of a character moving its mouth. Is there a way to use your library to do this or perhaps you could point me to a better option? Another option I'm considering is to create an ascii animation and move the lips up and down based on the waveform. Do you know how I might approach this? I understand that simply looking at the crests and troughs and aligning the lips to those doesn't work. Thanks

sway4em avatar Jan 06 '24 22:01 sway4em

Rhubarb is optimized for use in production pipelines and doesn't have any real-time support. Regarding alternatives:

  • Opening the mouth based on the power of the audio signal works to a degree, but tends to look rather bad.
  • Ironically, running a simple VAD to distinguish speech segments from pauses, then filling the speech segments with random mouth movements may even look better.
  • Depending on your TTS system, you may be able to get precise phoneme timings without any extra work. Depending on your C++ skills, you may be able to hack Rhubarb to directly take these timings as input, skipping all the time-consuming speech recognition work.

DanielSWolf avatar Jan 07 '24 15:01 DanielSWolf