Amphion icon indicating copy to clipboard operation
Amphion copied to clipboard

CosineAnnealingLR

Open lixuyuan102 opened this issue 1 year ago • 5 comments

The params of CosineAnnealingLR scheduler in valle_trainer.py seem different with pytorch Docs.

code:

            scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
                self.cfg.train.warmup_steps,
                self.optimizer,
                eta_min=self.cfg.train.base_lr,
            )

pytorch 2.0 Docs: torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=- 1, verbose=False)

If the T_max should be set as warmup_steps or have another special setting?

lixuyuan102 avatar Dec 19 '23 11:12 lixuyuan102

Hi, we updated a PR to fix the problem. You can check it! (we use: from diffusers.optimization import get_cosine_schedule_with_warmup)

HeCheng0625 avatar Dec 21 '23 02:12 HeCheng0625

Hi, we updated a PR to fix the problem. You can check it! (we use: from diffusers.optimization import get_cosine_schedule_with_warmup)

Thanks for the reply. I'm using the NoamScheduler with a base_lr of 0.05 and 2000 warmup steps to train Valle. Have you tested the differences between NoamScheduler and "cosine_schedule_with_warmup"? If so, could you share which is better and the best parameters? Thanks again and looking for your reply.

lixuyuan102 avatar Dec 21 '23 03:12 lixuyuan102

@HeCheng0625 Please follow up on this issue.

lmxue avatar Dec 27 '23 03:12 lmxue

Hi, we haven't test NoamScheduler, I think using AdamW with lr between 5e-5 to 1e-4 and cosine schedule with warmup steps between 5K to 1W steps will give a more stable training process.

HeCheng0625 avatar Dec 27 '23 06:12 HeCheng0625

Hi, we haven't test NoamScheduler, I think using AdamW with lr between 5e-5 to 1e-4 and cosine schedule with warmup steps between 5K to 1W steps will give a more stable training process.

Thanks for sharing.

lixuyuan102 avatar Dec 27 '23 08:12 lixuyuan102