Nikita Grebenyuk
Nikita Grebenyuk
Would be great feature!
You may try 44.1 KHz, worked for me. (set in config.json: sampling_rate = 44100). Also make sure your audio is 1-channel 16-bits wave.
> @nikich340 does your speech synthesis have a good result? > My result is ok but the quality of speech is not so good, there is still noise in it...
Removing wavs won't help you with CUDA OOM (mels are loaded in ram first), you should reduce batch size instead (= how much mels are loaded in gpu vram at...
+1, useless for windows as there is no way to build it normally.
It can be eSpeak phonemizer problem. You can edit text preprocessing scripts to make it accept IPA phonemes directly and change them as you need.
I had the same issue, it seems that logger can't show you output immediately, that's why you can't see some error.
> Excuse me, does this repository support Windows now? Still not..
Authors used 300k steps with batch = 64, start from that.
Update: it seems that "Ran out of input" error is because of RAM overload, cause by too many workers replicating loaded audios in RAM. https://github.com/jaywalnut310/vits/pull/118/commits/1c6cd68b1287fad7782eec6d88012ea5ce09d614