parler-tts
parler-tts copied to clipboard
Feature improvements
Hello, I tried the Parler-TTS Mini model and it exceeded my expectations with very good results.
However, I have some questions and possible suggestions for improvement:
- Will there be a multi-lingual version available, such as support for Mandarin?
- Currently, the accuracy of numbers and punctuation marks is not very good, and there are instances where words are dropped in sentences. Will these issues be addressed in future versions?
Thanks for the feedback!
- It's not planned yet, but I'd be happy to support any efforts on this!
- In terms of numbers, the training dataset lacks numbers and only use fully-written numbers. There are some issues with punctuation that will be addressed on the next versions!
In terms of dropping words, this is only a v0.1 version so I'd expect the upcoming versions to be better. One thing that I noticed though is that you should always finish your prompt with a punctuation mark, otherwise the model drops the last word!
thank u.
Just still wondering what kind of data should be prepared to supports Chinese or Japanese or Korean? I think it would be very much benifit opensource community if there are some chance to add any of above non-english language example!
@lucasjinreal, we need 3 things:
- A dataset of audio and transcriptions samples, moderatly clean with audio samples from 5s-ish to 30s
- Adapting DataSpeech recipe for the language(s) of this dataset and running it on the dataset
- Decide a prompt tokenizer adapted for these languages ( and English if you want to mix languages)
Let me know if that helps!
Note that step 2 should be quite reachable. We principally need to find phonemizer(s) adapted to the languages, to compute the speaking rate
Have you been able to get this working @lucasjinreal ?
Hi, didn't get time to try this recently. Does parler side has any progress on more langual support?