supervoice-voicebox
supervoice-voicebox copied to clipboard
Architecture of Supervoice
Hi, just saw your repo, and bit confused regarding the architecture and philosophy behind you TTS model. Could please add little bit regarding your architecture, like you training LLM for TTS but you also don training for duration which seems something new as most Large model TTS rely on autoregressive model for duration itself.
Although I will go through your code and try to be figured it out myself.