rhubarb-lip-sync icon indicating copy to clipboard operation
rhubarb-lip-sync copied to clipboard

AI Voice Support?

Open gamer805 opened this issue 10 months ago • 2 comments

I am working on a project that involves animating a chatbot, and wish to use rhubarb to animate the lip-syncing. However, after testing with both the PocketSphinx and Phonetic recognizers, neither seem to register individual phonemes, instead only resulting a rest output. This is the output I have consistently gotten testing on AI voices:

0.00 X

It should be said that in the same environment, rhubarb does work for recognizing recordings of my own voice, so I assume it has to do with the structure of AI generated audio files themselves. I wonder if support could possibly be added for tweaking sensitivity on the user's end, or if adding support for AI voices may require supporting an entirely new speech recognizer.

gamer805 avatar Feb 20 '25 19:02 gamer805

That sounds strange. It wouldn't surprise me if the results with AI voices were worse than for regular recordings, but seeing not output at all is unexpected. Could you attach an example file, so that I can reproduce the issue?

DanielSWolf avatar Feb 26 '25 14:02 DanielSWolf

I also just noticed that synthetic voices are not recognized too well by the PocketSphinx, are there perhaps now more options for more accurate recognizers? There are a lot of AI voice models been released

Just to clarify: I do get a lot of output still tho

vlrevolution avatar Apr 11 '25 20:04 vlrevolution