pflowtts_pytorch
pflowtts_pytorch copied to clipboard
Make it multi-language?
I was wondering if "injecting" language info would be possible. Something similar to what xtts is doing by injecting special language token e.g. [en] for GPT input.
Features from 3-sec speech prompt might not be enough (nor desired) to capture language of sample text (in order to do cross language speaker cloning). However concatenating "speech prompt" with some kind of language id (precomputed language features vector?) might enable ML (as multi-language) in addition to MS.
At inference changing this prompt part might enable inline language switching.
There might be better way of course. E.g. passing info directly to encoder PreNet? Anyway it wold be great to see this feature. VITS based YourTTS does similar thing.