Does the Model support Arabic?
i test urdu and i somehow able to generate listenable voice but robotic and mixed with noice
@humair-m do you know other models which support urdu TTS?
i am also searching for that ; i found indicTTS however quality is not good ; samething is with TTS by meta ; no open source SOTA TTS is available for uedu
Because no one dare to work on URDU, we are just copy paster, why would someone work on Urdu when we don't. Do you know what model we can setup as baseline and then start from there?
I'm training a model with Zia Mohayyaodin Sb voice, let see how it came to the results.
Yes , you are right. At the moment we are just relying on big gaints companies . I not found any high quality Audio and even text dataset on whole huggingface .
Secondly , you asks for model from where you starts , most of opensource models don't support urdu .
As per my knowledge , recent SOTA TTS use small language model [SLM] as backbone , due to that backbone we get rid of rule base guidence , as you know Urdu is highly complex .Since there is no such model which is trained on large urdu data { i think due to open source urdu data scarcity} so you don't finetune them .But if you have enogh resource , you can do that by getting synthetic data from gemini flash tts or openai tts .
am trying to use Hindi based model for voice cloning as hindi is nearly nearly like urdu when speak. So let see what happens, we wish our companies should do something for Urdu else we'll have to rely on English.
Nope , Google Gemini Flash TTS is very expressive and I love it I want to create high quality Urdu TTS dataset but as you know well API is not FREE and money are required to do (and i am student)
I created some Urdu datasets with openai voices BUT according to me they are not expressive [as i get free but rate lim api] check > https://huggingface.co/humair025 Anyway please shares your experiments on huggingface
Yes, the model supports Arabic. (I'm not an Arabic speaker but according to an Arabic speaker the base model's reading is "not eloquent, but [would be accepted] as perfect")
An audio sample on a fine-tuned Arabic model: https://discord.com/channels/1416805195635228795/1418026476921421996/1421271574086025226