Too slow for realtime, tips on speeding it up
Getting a 2.75x realtime on 3090, too slow for realtime. Any tips, willing to edit or retrain model if needed. I have made a streaming engine for it as well as better voice cloning via audio normalization and more. But to make it usable it needs to be way faster, any papers on it or any tips to speed it up in general. Dynamic quantization didn't quite work.
check my openai api its a few posts down there .. should be faster then that .. i run on a6000 .. that is slower then a 3090 ..
Move the backend inf code to vllm or trt-llm instead of basic hf transformer inf (lots more work)
Use streaming https://github.com/davidbrowne17/chatterbox-streaming ;)
@davidbrowne17 you're a legend
@davidbrowne17 still very slow for me, 2.8x realtime on a 3090
The README advertise their paid service for ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.
Can anyone confirm if adding chatterbox support to fastrtc would handle the speed issue, since the streaming can be handled in fastrtc?
I'm confused. You are getting 2.75x realtime. Meaning for every minute of conversion you get 2.75 minutes of audio and that's bad?