hifi-gan
hifi-gan copied to clipboard
Fine-tuning from pretrained models?
The pretrained models have names like: generator_v1
However, train.py looks for checkpoints with the following code:
if os.path.isdir(a.checkpoint_path):
cp_g = scan_checkpoint(a.checkpoint_path, 'g_')
cp_do = scan_checkpoint(a.checkpoint_path, 'do_')
where are pretrained checkpoints we can use for fine-tuning? Or can you clarify how to invoke the script for finetuning?
I think all models expect universal one are shared to make a demo. In this case, fine-tune part is redundant. Otherwise, we should change your shared code piece and load generator and start from epoch 0 with pre-trained generator.
Could you explain what you mean?
If I am using UNIVERSAL to fine-tune a particular use-case, should I change last_epoch
to 0 in train.py
? I'm concerned that I'm starting with a learning_rate that has already had a very large ExponentialLR lr_decay already applied to it, and that maybe I should start with the original learning rate?
https://drive.google.com/drive/folders/1YuOoV3lO2-Hhn1F2HJ2aQ4S0LC1JdKLd
@jmasterx yes, I have seen the universal model. If you are fine-tuning from the universal model using the existing config, since it already has done many epochs, does the fine-tuning start with a very low LR? Is the ExponentialLR lr_decay applied? Should I change last_epoch to 0 in train.py so that fine-tuning starts with the initial LR? Or do I want to fine tune from a very small LR that has been exponentially decayed?
My question more is about details of how fine-tuning is applied. Does it typically use the current LR at this epoch? Or is LR reset? I can't find any written info about the details of how fine-tuning is applied in practice.
The schedulers are created like this:
scheduler_g = torch.optim.lr_scheduler.ExponentialLR(optim_g, gamma=h.lr_decay, last_epoch=last_epoch)
scheduler_d = torch.optim.lr_scheduler.ExponentialLR(optim_d, gamma=h.lr_decay, last_epoch=last_epoch)
and the epoch info is loaded from the state dict:
last_epoch = state_dict_do['epoch']
So it would make sense to me that it will pick back up where it left off with the decayed LR. And if you continue training it with a new dataset, I would think the LR decay would still apply so it would learn a lot slower than step 0.
If you want to train it from scratch but starting with the Universal weights, then probably clear the steps and last_epoch? (don't load them)
@jmasterx thanks, that was my thought too. Any idea what was done by the authors when they released the fine-tuned models? @jik876 ?
Could you explain what you mean?
If I am using UNIVERSAL to fine-tune a particular use-case, should I change
last_epoch
to 0 intrain.py
? I'm concerned that I'm starting with a learning_rate that has already had a very large ExponentialLR lr_decay already applied to it, and that maybe I should start with the original learning rate?
I fine-tuned UNIVERSAL and it is not required to set last-epoch=0, it automatically continues. The universal one is already okey to use. Hovewer I don't understand other models. I load generator and trained a model, but it seems it starts from zero, not like fine-tunning
@EmreOzkose i guess that's because author shared discriminator only for universal model
@thepowerfuldeez The universal model has both generator and discriminator?
Yes the link I provided contains both the do_ file (discriminator weights and optimizer state) and g_ file, (generator weights)
@jmasterx When fine-tuning on an existing model, like tacotron2, did the weights and weight schedule start again at 0 or at the epoch the fine-tuning started at?
Thank you for the link
I think all models expect universal one are shared to make a demo. In this case, fine-tune part is redundant. Otherwise, we should change your shared code piece and load generator and start from epoch 0 with pre-trained generator.
It seems that fine-tuning is just preformed for end2end TTS synthesis, i.e., it fine-tunings the model with the synthesized mel-spectrogram from the TTS acoustic model, such as Tacotron2.
Is it possible to fine-tune VCTK_V2 using the discriminator from UNIVERSAL_V1?