hifi-gan
hifi-gan copied to clipboard
ExponentialLR and fine tune
Hi. I started training the model from scratch and found that the optimizer uses a dynamic learning step. If I train the model with 2.5 million steps, then according to my calculations, the training step will drop to 3e-7. Not only is this a very low learning step that can cause floating point errors, it is also impossible to adapt other speakers at such a checkpoint, because the learning step is too small. Does this mean that it is better to set lr_decay = 1.0?
Hi. In our experiments, we observed that the learning rate decay affects stable quality improvement. I would like to recommend adjusting the learning rate loaded from a checkpoint during transfer learning.
@jik876 I'm not sure if this is the right thing to do, because during training with a low learning rate, the minimum is looking for smaller areas. When we change the learning rate, to a higher one from a lower one, the minimum search algorithm will search for a minimum in wider areas, respectively, everything that the model has learned at a smaller learning step will be forgotten. It seems to me that it is more correct to train at a fixed learning rate, and then adapt the speakers on this model with a dynamic learning rate. Maybe I'm wrong. What do you think?
If the purpose of training the source speaker is to make a basis for transfer learning to another speaker, then the method you mentioned may be good. If you're going to use the source speaker model as it is, refer to our experiments, using the learning rate decay leads to better quality. The choice of learning rate in transfer learning will be affected by the difference between the source speaker and the target speaker. If the speech characteristics of the two speakers are not very different, using a small learning rate in the training will yield quite good results, and vice versa.