sova-tts-engine icon indicating copy to clipboard operation
sova-tts-engine copied to clipboard

What are the best params and how to check if training is ok?

Open nikich340 opened this issue 3 years ago • 1 comments

I started training model by your short guide, prepared ~15 hours of russian audios from 1 to 10 seconds in format 16bit/22050 Hz, generated training and validation list. Usually it is recommended to trim silence from audios, but hparams from your models have trim_silence: False option, why? Transcription texts were prepared by your NLP-handler with edited stress dictionary (to wrap some missed words).

Now I am training from scratch using params: epochs: 100, iters_per_checkpoint: 500, fp16_run: true, warm_start: false (makes sense when I continue training from checkpoint), also I set lr_scheduler_options as they were in ruslan hparams.yaml (there was some error with default params from this repo) and batch_size: 5 (my GPU can't handle more, but I hope to access better machine in feature). Other params are as they were originally in current state of this repo.

So it seems that learning is running without problems, but I wanted to ask what exactly params did you use to train ruslan/natasha models? What changes are recommended during training process (learning_rate?) and what should be overall loss, Grad Norm and other losses for good results as in your pre-trained models?

Finally, I noticed that my model checkpoints have fixed size of 329.880 KB, while your models are 109.990 KB. Am I doing something wrong?

nikich340 avatar Nov 21 '21 17:11 nikich340

Hello @nikich340! Sorry for the late reply, I'm catastrophically busy due to a new release that is coming soon.

Usually it is recommended to trim silence from audios, but hparams from your models have trim_silence: False option, why?

It's because the trim_top_db parameter may vary from one dataset to another, so the user has to experiment and find a suitable value before switch the trim_silence flag on. Keep in mind, that inner silence will also be cut out.

... what exactly params did you use to train ruslan/natasha models?

You can get good results by using the default paramters from the hparams.yaml. If I remember correctly, the same parameters were used to train sova Ruslan and Natasha's models.

What changes are recommended during training process (learning_rate?)

You can experiment with lr scheduler's settings

what should be overall loss, Grad Norm and other losses for good results as in your pre-trained models?

The value of the total validation loss is about 0.6 - this is ok

Finally, I noticed that my model checkpoints have fixed size of 329.880 KB, while your models are 109.990 KB. Am I doing something wrong?

No, it's because full checkpoint is saved during train process, and this checkpoint consists of the parameters of the model and the optimizer (and some other), take a look here. You can cut values from the checkpoint dict, that are unnecessary for the inference:

import torch

checkpoint_dict = torch.load("path/to/checkpoint", map_location="cpu")
for key in list(checkpoint_dict.keys()):
    if key not in ["state_dict", "hparams"]:
        checkpoint_dict.pop(key)
torch.save(checkpoint_dict, "path/to/reduced_checkpoint")

Dekakhrone avatar Dec 03 '21 17:12 Dekakhrone