Add F5 TTS
I just saw this TTS model(https://github.com/SWivid/F5-TTS), which works very well for English. Are you planning on including it on the project? Thanks!!
Absolutely thinking about that, I love that TTS system. Things that keep me off currently:
- Not on pypi - without pip install I need to copy their whole repo, don't want that
- No streaming support so bigger latency than for other engines Both are kind of bummers but 1 is the bigger one. I need a solution to gracefully install it with RealtimeTTS
Hi! @KoljaB, would pip install git+https://github.com/SWivid/F5-TTS.git not suffice as a pip install?
Sadly not, pypi would not accept this within a setup file. So I can't integrate it to be installed with "pip install RealtimeTTS[F5]" currently. Also no streaming support is still a bummer. Still my first pick TTS system that I would love to integrate (together with GPT-SoVITS).
Looked into F5-TTS code.
Implementing real-time streaming is not trivial because the model processes entire sequences at once using full-sequence attention and ODE integration. This doesn't support incremental output. So it would require significant architectural changes to support incremental, real-time computation needed for streaming.
I have a working version of F5-TTS that streams and uses a apache licensed model trained on English, F5-TTS now has pip install f5-tts, perhaps this will help someone out. All info here - https://drive.google.com/drive/folders/1LmAkK08YFWz57e8EF8ZgovqspKFk56xQ?usp=sharing
Wow, nice. I'll definitely look into that soon, thanks a lot.
Hi all, is there any update on adding the F5 TTS?
Looked into it again some weeks ago. It's not ready for real-time implementations.