sova-tts-engine What are the best params and how to check if training is ok?

What are the best params and how to check if training is ok?

Open nikich340 opened this issue 3 years ago • 1 comments

I started training model by your short guide, prepared ~15 hours of russian audios from 1 to 10 seconds in format 16bit/22050 Hz, generated training and validation list. Usually it is recommended to trim silence from audios, but hparams from your models have trim_silence: False option, why? Transcription texts were prepared by your NLP-handler with edited stress dictionary (to wrap some missed words).

Now I am training from scratch using params: epochs: 100, iters_per_checkpoint: 500, fp16_run: true, warm_start: false (makes sense when I continue training from checkpoint), also I set lr_scheduler_options as they were in ruslan hparams.yaml (there was some error with default params from this repo) and batch_size: 5 (my GPU can't handle more, but I hope to access better machine in feature). Other params are as they were originally in current state of this repo.

So it seems that learning is running without problems, but I wanted to ask what exactly params did you use to train ruslan/natasha models? What changes are recommended during training process (learning_rate?) and what should be overall loss, Grad Norm and other losses for good results as in your pre-trained models?

Finally, I noticed that my model checkpoints have fixed size of 329.880 KB, while your models are 109.990 KB. Am I doing something wrong?

Nov 21 '21 17:11 nikich340

Hello @nikich340! Sorry for the late reply, I'm catastrophically busy due to a new release that is coming soon.

Usually it is recommended to trim silence from audios, but hparams from your models have trim_silence: False option, why?

It's because the trim_top_db parameter may vary from one dataset to another, so the user has to experiment and find a suitable value before switch the trim_silence flag on. Keep in mind, that inner silence will also be cut out.

... what exactly params did you use to train ruslan/natasha models?

You can get good results by using the default paramters from the hparams.yaml. If I remember correctly, the same parameters were used to train sova Ruslan and Natasha's models.

What changes are recommended during training process (learning_rate?)

You can experiment with lr scheduler's settings

what should be overall loss, Grad Norm and other losses for good results as in your pre-trained models?

The value of the total validation loss is about 0.6 - this is ok

Finally, I noticed that my model checkpoints have fixed size of 329.880 KB, while your models are 109.990 KB. Am I doing something wrong?

No, it's because full checkpoint is saved during train process, and this checkpoint consists of the parameters of the model and the optimizer (and some other), take a look here. You can cut values from the checkpoint dict, that are unnecessary for the inference:

import torch

checkpoint_dict = torch.load("path/to/checkpoint", map_location="cpu")
for key in list(checkpoint_dict.keys()):
    if key not in ["state_dict", "hparams"]:
        checkpoint_dict.pop(key)
torch.save(checkpoint_dict, "path/to/reduced_checkpoint")

Dec 03 '21 17:12 Dekakhrone

sova-tts-engine sova-tts-engine copied to clipboard

What are the best params and how to check if training is ok?

sova-tts-engine
sova-tts-engine copied to clipboard