Shivam Mehta
Shivam Mehta
I think the dataset size and training should be enough. > 4.the n_vocab: 50 of the symbols ,Is there any influence? Do you really have only 50 symbols? I feel...
It works for me, it is just less than one second, therefore, the gradio interface is showing 0:00! But if you press the play button you should hear it saying...
Yes! That would be the only solution as it is not an issue. The model synthesises what is asked of it, it's just the generated audio is short.
This is so strange it should be ignored when doing lookup! But if the symbol is not in the symbols file, It won't generate it. https://github.com/shivammehta25/Matcha-TTS/blob/main/matcha/text/symbols.py
This seems to be correct! I guess one way would be to surround the call with a try except! https://github.com/shivammehta25/Matcha-TTS/blob/d31cd92a6122fb99987715248941c96744bf0a36/matcha/text/__init__.py#L22
I am really sorry I seem to have missed this question, yes: > If I add a new character do I have to modify the tokens number in the text...
Hi! That is a cool experiment. Did you fine-tune the vocoder too? Why I am asking this is because: VITS has a built-in vocoder as it is an end-to-end TTS...
Practically, we do concatenate both the mu and the random noise in the Unet https://github.com/shivammehta25/Matcha-TTS/blob/108906c603fad5055f2649b3fd71d2bbdf222eac/matcha/models/components/decoder.py#L384 So, we didn't see much difference, but the conditional flow matching framework is not dependent...
This is so great to hear! I appreciate you guys experimenting with it :D
Hello, that is a great question! **TLDR:** The idea comes from multiple states per phone in a Hidden Markov Model (HMM) based speech synthesisers for better modelling. \[Our previous work...