StyleTTS issues

MelSpectrogram() and unspecified sampling rate

2

in the meldataset.py, could see that all wav files are resampled to 24000sps. however, as the MelSpectrogram() transform is called without `sample_rate` argument defaults to 16000sps. ``` to_mel = torchaudio.transforms.MelSpectrogram(...

dsplog

mandrain support?

102

mandrain support?

lucasjinreal

help wanted

Voice Quality issue using Librispeech

Hello, I'm found difference between the generated audios from the provided demo notebook using Librispeech and the audios available on the web page. The generated audios lack naturalness compared to...

Anshu-Kumar-1

Marathi Support ?

marathi support? How to change [word_index_dict.txt](https://github.com/yl4579/AuxiliaryASR/blob/main/word_index_dict.txt)?

raushanagrawal

Is the uv detector trained in the pretrained pitch detector?

Hi, Although the voice-unvoice detector part is removed from the pitch detector code, is the uv detector still pretrained in your provided JDCnet checkpoint? In other words, if I add...

auspicious3000

Has anyone had this problem when converting to onnx?

![image](https://github.com/yl4579/StyleTTS/assets/81943524/1a696d7c-e628-4534-aafa-3bcfa33bd998) When I searched here, it was because InstanceNorm1d was not supported.

bobo-paopao

continue correct number of epoch from pretrained

Continue correct number of epoch from pre-trained model to avoid overwriting already pre-trained model. For example if pre-trained model saved as 6th epoch we will be continue train 7th epoch.

magicse

Fix Requirements, Add Colab

10

Hi, I just ran your code, and fixed some stuff(attrdict and espeak-ng were missing in the setup). I also made a colab notebook for the LJSpeech model inference. Thanks

nivibilla

amazing work,can it support that generate the phoneme delayed time sequence?

it's a great work, is it support that generate the phoneme delayed time sequence ?or is there any tool we can get the phoneme delayed time sequence?

CasonTsai

Is there a first-stage pretrained model?

I'm working on modifying some of the second-stage networks, so it's more convenient for me if the first-stage pretrained model is provided. thank you very much!

shiyan-liu

StyleTTS
StyleTTS copied to clipboard

Metadata

MelSpectrogram() and unspecified sampling rate

mandrain support?

Voice Quality issue using Librispeech

Marathi Support ?

Is the uv detector trained in the pretrained pitch detector?

Has anyone had this problem when converting to onnx?

continue correct number of epoch from pretrained

Fix Requirements, Add Colab

amazing work,can it support that generate the phoneme delayed time sequence?

Is there a first-stage pretrained model?

← Metadata

Owner

Metadata

StyleTTS StyleTTS copied to clipboard

Metadata

← Metadata

Owner

Metadata

StyleTTS
StyleTTS copied to clipboard