Adan icon indicating copy to clipboard operation
Adan copied to clipboard

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Results 12 Adan issues
Sort by recently updated
recently updated
newest added

Is there a TensorFlow/Keras implementation of Adan? If no official version, do you know of any third-party implementation? Or alternatively, how many lines would you expect an implementation to have?...

Hi, very interesting work! The only problem i see is that your optimizer is slower that sgd/adamw which may discourage some people from using it. Do you plan adding an...

Hi, Thank you very much for your brilliant work on Adan! And from you paper, it said Adan should get a lower loss (both Train and test) than Adamw according...

Hello! I think I found a bug in the Adan optimizer, which affects embedding tables. I implemented Adan optimzier in Tensorflow 2. You could find the implementation [here](https://github.com/DenisVorotyntsev/Adan) I wanted...

您好请问您是否有研究过将Adan用于Diffusion模型训练,其学习率应该如何设置,可否与使用AdamW的学习率一样?

Could you please release the pre-trained ViT-S based on MAE?

Hi, Adan是一个性能十分优秀的优化器,谢谢你们的工作。 但我最近在尝试用Adan进行指令微调时,发现loss曲线很漂亮,但是下游任务表现(GSM-8k)不如预期。 同样的数据处理和评测,AdamW大概9.63,Adan只有5.08左右。 AdamW超参数:weight_decay 0.01, lr 2e-5 Adan超参数:weight_decay 0.02,按照repo的建议lr尝试了2e-4 1e-4, GSM8k都比较低 lr scheduler都是3%升到最高然后下降到0 AdamW的训练loss曲线: Adan的训练loss曲线: 使用的代码: ```python from adan import Adan optimizer = Adan(model.parameters(), lr=args.lr, weight_decay=0.02, foreach=True, fused=True) ```...

Dear authors: According to the `README.md` of this amazing project, the `weight_decay` param should be `0.02`, while in the configuration file attached in https://github.com/sail-sg/Adan/issues/32, the `WD` seems to be `0.05`....

> The following steps are modified from [Fairseq-Roberta](https://github.com/facebookresearch/fairseq/blob/main/examples/roberta/README.pretraining.md). For completeness, we list some key steps here. I would like to ask why you modified the dataset settings? In the original...