Shivam Mehta comments

Results 47 comments of


                                            Shivam Mehta

Is there any experiment on Chinese data set.

I think the dataset size and training should be enough. > 4.the n_vocab: 50 of the symbols ,Is there any influence? Do you really have only 50 symbols? I feel...

cannot synthesize the pronunciation of single word

It works for me, it is just less than one second, therefore, the gradio interface is showing 0:00! But if you press the play button you should hear it saying...

cannot synthesize the pronunciation of single word

Yes! That would be the only solution as it is not an issue. The model synthesises what is asked of it, it's just the generated audio is short.

Matcha-TTS app error: symbol_id = _symbol_to_id[symbol] KeyError: '('

This is so strange it should be ignored when doing lookup! But if the symbol is not in the symbols file, It won't generate it. https://github.com/shivammehta25/Matcha-TTS/blob/main/matcha/text/symbols.py

Matcha-TTS app error: symbol_id = _symbol_to_id[symbol] KeyError: '('

This seems to be correct! I guess one way would be to surround the call with a try except! https://github.com/shivammehta25/Matcha-TTS/blob/d31cd92a6122fb99987715248941c96744bf0a36/matcha/text/__init__.py#L22

Matcha-TTS app error: symbol_id = _symbol_to_id[symbol] KeyError: '('

I am really sorry I seem to have missed this question, yes: > If I add a new character do I have to modify the tokens number in the text...

Matcha compared to Vits

Hi! That is a cool experiment. Did you fine-tune the vocoder too? Why I am asking this is because: VITS has a built-in vocoder as it is an end-to-end TTS...

starting from N(mu, I) or starting from N(0, I)？？？which is better

Practically, we do concatenate both the mu and the random noise in the Unet https://github.com/shivammehta25/Matcha-TTS/blob/108906c603fad5055f2649b3fd71d2bbdf222eac/matcha/models/components/decoder.py#L384 So, we didn't see much difference, but the conditional flow matching framework is not dependent...

A successfull fa/en implementation report

This is so great to hear! I appreciate you guys experimenting with it :D

the motivation for inserting blank IDs between the input IPA-ids?

Hello, that is a great question! **TLDR:** The idea comes from multiple states per phone in a Hidden Markov Model (HMM) based speech synthesisers for better modelling. \[Our previous work...