taming-transformers Did lr(learning rate) scheduler was used?

Hi compvis group, thanks for your overwheming work and i'd love to deploy your method into my project.

However, i found there is no learning rate scheduler(lr scheduler) was used for training. More precisely, the ./taming/models/vqgan.py, VQGAN.configure_optimizer only returns two optimizers. And self.learning_rate is model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr(in main.py), which is a constant.

    def configure_optimizers(self):
        # `self.learning_rate` is assigned from outside the class
        lr = self.learning_rate
        opt_ae = torch.optim.Adam(list(self.encoder.parameters())+
                                  list(self.decoder.parameters())+
                                  list(self.quantize.parameters())+
                                  list(self.quant_conv.parameters())+
                                  list(self.post_quant_conv.parameters()),
                                  lr=lr, betas=(0.5, 0.9))
        opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),
                                    lr=lr, betas=(0.5, 0.9))
        return [opt_ae, opt_disc], []

That seems strange. To my knowledge, when training a big model from scratch, learning rate should be adjusted with a scheduler, since big lr is needed for the beginning and small lr for further steps.

Did i missunderstand anything? Could you please give me some hint? Thanks in advance!

Oct 26 '22 03:10 Maxlinn

desperately want to know, especically for training setting with Gumbel Softmax (official config is here and as follows: https://heibox.uni-heidelberg.de/d/2e5662443a6b4307b470/)

sincere thanks to who could offer any help.

model:
  base_learning_rate: 4.5e-06
  target: taming.models.vqgan.GumbelVQ
  params:
    kl_weight: 1.0e-08
    embed_dim: 256
    n_embed: 8192
    monitor: val/rec_loss
    temperature_scheduler_config:
      target: taming.lr_scheduler.LambdaWarmUpCosineScheduler
      params:
        warm_up_steps: 0
        max_decay_steps: 1000001
        lr_start: 0.9
        lr_max: 0.9
        lr_min: 1.0e-06
    ddconfig:
      double_z: false
      z_channels: 256
      resolution: 256
      in_channels: 3
      out_ch: 3
      ch: 128
      ch_mult:
      - 1
      - 1
      - 2
      - 4
      num_res_blocks: 2
      attn_resolutions:
      - 32
      dropout: 0.0
    lossconfig:
      target: taming.modules.losses.vqperceptual.DummyLoss

Oct 26 '22 04:10 Maxlinn

for model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr, if ngpu is big and bs is big, the learning rate would be very large, is that acceptable?

Oct 26 '22 04:10 Maxlinn

i don't think model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr is a good setting for learning rate, for me, my ngpus=8,bs=3,base_lr=0.0625(default in config), and in this setting my train_loss can't go down. Does someone gets a good result in this learning rate setting?

Oct 19 '23 03:10 order-a-lemonade

suddenly i find that the default lr in config file is 4.5e-6. instead of 0.0625. so maybe this is the question

Oct 19 '23 06:10 order-a-lemonade

taming-transformers taming-transformers copied to clipboard

Did lr(learning rate) scheduler was used?

taming-transformers
taming-transformers copied to clipboard