StyleTTS icon indicating copy to clipboard operation
StyleTTS copied to clipboard

Official Implementation of StyleTTS

Results 15 StyleTTS issues
Sort by recently updated
recently updated
newest added

in the meldataset.py, could see that all wav files are resampled to 24000sps. however, as the MelSpectrogram() transform is called without `sample_rate` argument defaults to 16000sps. ``` to_mel = torchaudio.transforms.MelSpectrogram(...

mandrain support?

help wanted

Hello, I'm found difference between the generated audios from the provided demo notebook using Librispeech and the audios available on the web page. The generated audios lack naturalness compared to...

marathi support? How to change [word_index_dict.txt](https://github.com/yl4579/AuxiliaryASR/blob/main/word_index_dict.txt)?

Hi, Although the voice-unvoice detector part is removed from the pitch detector code, is the uv detector still pretrained in your provided JDCnet checkpoint? In other words, if I add...

![image](https://github.com/yl4579/StyleTTS/assets/81943524/1a696d7c-e628-4534-aafa-3bcfa33bd998) When I searched here, it was because InstanceNorm1d was not supported.

Continue correct number of epoch from pre-trained model to avoid overwriting already pre-trained model. For example if pre-trained model saved as 6th epoch we will be continue train 7th epoch.

Hi, I just ran your code, and fixed some stuff(attrdict and espeak-ng were missing in the setup). I also made a colab notebook for the LJSpeech model inference. Thanks

it's a great work, is it support that generate the phoneme delayed time sequence ?or is there any tool we can get the phoneme delayed time sequence?

I'm working on modifying some of the second-stage networks, so it's more convenient for me if the first-stage pretrained model is provided. thank you very much!