taming-transformers
taming-transformers copied to clipboard
Did lr(learning rate) scheduler was used?
Hi compvis group, thanks for your overwheming work and i'd love to deploy your method into my project.
However, i found there is no learning rate scheduler(lr scheduler) was used for training.
More precisely, the ./taming/models/vqgan.py, VQGAN.configure_optimizer only returns two optimizers.
And self.learning_rate is model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr(in main.py), which is a constant.
def configure_optimizers(self):
# `self.learning_rate` is assigned from outside the class
lr = self.learning_rate
opt_ae = torch.optim.Adam(list(self.encoder.parameters())+
list(self.decoder.parameters())+
list(self.quantize.parameters())+
list(self.quant_conv.parameters())+
list(self.post_quant_conv.parameters()),
lr=lr, betas=(0.5, 0.9))
opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),
lr=lr, betas=(0.5, 0.9))
return [opt_ae, opt_disc], []
That seems strange. To my knowledge, when training a big model from scratch, learning rate should be adjusted with a scheduler, since big lr is needed for the beginning and small lr for further steps.
Did i missunderstand anything? Could you please give me some hint? Thanks in advance!
desperately want to know, especically for training setting with Gumbel Softmax (official config is here and as follows: https://heibox.uni-heidelberg.de/d/2e5662443a6b4307b470/)
sincere thanks to who could offer any help.
model:
base_learning_rate: 4.5e-06
target: taming.models.vqgan.GumbelVQ
params:
kl_weight: 1.0e-08
embed_dim: 256
n_embed: 8192
monitor: val/rec_loss
temperature_scheduler_config:
target: taming.lr_scheduler.LambdaWarmUpCosineScheduler
params:
warm_up_steps: 0
max_decay_steps: 1000001
lr_start: 0.9
lr_max: 0.9
lr_min: 1.0e-06
ddconfig:
double_z: false
z_channels: 256
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 1
- 2
- 4
num_res_blocks: 2
attn_resolutions:
- 32
dropout: 0.0
lossconfig:
target: taming.modules.losses.vqperceptual.DummyLoss
for model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr, if ngpu is big and bs is big, the learning rate would be very large, is that acceptable?
i don't think model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr is a good setting for learning rate, for me, my ngpus=8,bs=3,base_lr=0.0625(default in config), and in this setting my train_loss can't go down. Does someone gets a good result in this learning rate setting?
suddenly i find that the default lr in config file is 4.5e-6. instead of 0.0625. so maybe this is the question