Train voice having 44Khz sampling rate
Hi! I have appr. 1.5 hours of audio voice at 44Khz and like to train a usable model from it. I don't want to retrain, as the pre-trained checkpoints are all 22Khz, sounding muddy and not that good. I tried training from scratch, specifying the correct sampling_rate of 44100. Reached 2000 epochs, but the inferred audio was way too fast, skipping words in the process.
What should I modify or patch in to make this work?
thanks!
i suggest resampling your data to 22050 Hz. you can use ffmpeg to do so
I would abstain from that if possible, due to huge quality loss.
Make sure the samplerate is set correctly everywhere, not just training but also inference: https://github.com/search?q=repo%3Arhasspy%2Fpiper%2022050&type=code
Other than that my guess is that you would need to adapt the decoder parameters here: https://github.com/rhasspy/piper/blob/master/src/python/piper_train/vits/config.py#L30
@donlk I am trying the exact same thing as you did. Only wish I had seen this before wasting the money on the training. Did you figure out any final solution to this?
The audio parameters that @Luke100000 linked are tuned for 22Khz (not by me, by the original authors). Did you choose "high" quality when using 44Khz data?
The audio parameters that @Luke100000 linked are tuned for 22Khz (not by me, by the original authors). Did you choose "high" quality when using 44Khz data?
in my case I went with "medium" quality since in docs both medium and high uses the same sample rate, I figured the result will be the same