Xiangning Chen comments

Results 21 comments of


                                            Xiangning Chen

How to give sample weights?

Hey, have the sample_weight problem be solved now? Thanks

[WIP] Testing the lion optimizer

@mitchellnw Thanks for the experiments! I observe that you used betas=0.9, 0.95 for AdamW compared to the default betas=0.9, 0.999. While for Lion it's still the default one betas=0.9, 0.99,...

[WIP] Testing the lion optimizer

@lucidrains Thanks for sharing the good news! We always used the same learning rate schedule as AdamW in our experiments including cosine decay, linear decay, and constant (all with 10K...

[WIP] Testing the lion optimizer

@mitchellnw Thank you! Actually I would decrease the WD when raising the LR to maintain the effective weight decay strength LR*WD.

[WIP] Testing the lion optimizer

Thanks for the experiments! Can I know the learning rate schedule and warmup iterations. @mitchellnw Also, the wd is still 0.2 for AdamW right? What about lr=4e-4, betas=(0.95, 0.98), wd=1.0...

[WIP] Testing the lion optimizer

@lucidrains May I know on what domains they observe a faster convergence but worse generalization? Thanks! To me, this appears to be a consequence of using a small learning rate,...

[WIP] Testing the lion optimizer

@mitchellnw May I know the warmup iterations and learning rate schedule? Thanks!

[WIP] Testing the lion optimizer

@mitchellnw Thanks for the information! I quickly tested on training ViT-B/16 with 20k steps and batch size 4,096. I will also try on CLIP training as well. Here are the...

[WIP] Testing the lion optimizer

> For AdamW do you think performance could improve by, e.g., moving away from default beta2 to 0.98 or 0.95. If with the same learning rate, I don't think this...

[WIP] Testing the lion optimizer

@mitchellnw I have some updates to share! I discovered that the ***initial temperature value*** has an impact, and tuning it has resulted in better performance for Lion, compared to AdamW....