Nikita Grebenyuk comments

Results 47 comments of


                                            Nikita Grebenyuk

Questions about 48k audio file train

You may try 44.1 KHz, worked for me. (set in config.json: sampling_rate = 44100). Also make sure your audio is 1-channel 16-bits wave.

Questions about 48k audio file train

> @nikich340 does your speech synthesis have a good result? > My result is ok but the quality of speech is not so good, there is still noise in it...

Please help me!

Removing wavs won't help you with CUDA OOM (mels are loaded in ram first), you should reduce batch size instead (= how much mels are loaded in gpu vram at...

[Error] [Win] Unable to pre-compile async_io on Windows

+1, useless for windows as there is no way to build it normally.

Mispronounce some words and 44,1 Khz audio

It can be eSpeak phonemizer problem. You can edit text preprocessing scripts to make it accept IPA phonemes directly and change them as you need.

Colab won't train

I had the same issue, it seems that logger can't show you output immediately, that's why you can't see some error.

is it work on Win platform?

> Excuse me, does this repository support Windows now? Still not..

How many steps should we train to get the best results?

Authors used 300k steps with batch = 64, start from that.

Update: it seems that "Ran out of input" error is because of RAM overload, cause by too many workers replicating loaded audios in RAM. https://github.com/jaywalnut310/vits/pull/118/commits/1c6cd68b1287fad7782eec6d88012ea5ce09d614