sova-tts-engine
sova-tts-engine copied to clipboard
What are the best params and how to check if training is ok?
I started training model by your short guide, prepared ~15 hours of russian audios from 1 to 10 seconds in format 16bit/22050 Hz, generated training and validation list. Usually it is recommended to trim silence from audios, but hparams from your models have trim_silence: False option, why? Transcription texts were prepared by your NLP-handler with edited stress dictionary (to wrap some missed words).
Now I am training from scratch using params: epochs: 100, iters_per_checkpoint: 500, fp16_run: true, warm_start: false (makes sense when I continue training from checkpoint), also I set lr_scheduler_options as they were in ruslan hparams.yaml (there was some error with default params from this repo) and batch_size: 5 (my GPU can't handle more, but I hope to access better machine in feature). Other params are as they were originally in current state of this repo.
So it seems that learning is running without problems, but I wanted to ask what exactly params did you use to train ruslan/natasha models? What changes are recommended during training process (learning_rate?) and what should be overall loss, Grad Norm and other losses for good results as in your pre-trained models?
Finally, I noticed that my model checkpoints have fixed size of 329.880 KB, while your models are 109.990 KB. Am I doing something wrong?
Hello @nikich340! Sorry for the late reply, I'm catastrophically busy due to a new release that is coming soon.
Usually it is recommended to trim silence from audios, but hparams from your models have trim_silence: False option, why?
It's because the trim_top_db parameter may vary from one dataset to another, so the user has to experiment and find a suitable value before switch the trim_silence flag on. Keep in mind, that inner silence will also be cut out.
... what exactly params did you use to train ruslan/natasha models?
You can get good results by using the default paramters from the hparams.yaml. If I remember correctly, the same parameters were used to train sova Ruslan and Natasha's models.
What changes are recommended during training process (learning_rate?)
You can experiment with lr scheduler's settings
what should be overall loss, Grad Norm and other losses for good results as in your pre-trained models?
The value of the total validation loss is about 0.6 - this is ok
Finally, I noticed that my model checkpoints have fixed size of 329.880 KB, while your models are 109.990 KB. Am I doing something wrong?
No, it's because full checkpoint is saved during train process, and this checkpoint consists of the parameters of the model and the optimizer (and some other), take a look here. You can cut values from the checkpoint dict, that are unnecessary for the inference:
import torch
checkpoint_dict = torch.load("path/to/checkpoint", map_location="cpu")
for key in list(checkpoint_dict.keys()):
if key not in ["state_dict", "hparams"]:
checkpoint_dict.pop(key)
torch.save(checkpoint_dict, "path/to/reduced_checkpoint")