STAM Could you please share training hyper-parameters?

Could you please share training hyper-parameters?

Open stevehuanghe opened this issue 3 years ago • 1 comments

Hello,

This work is really inspiring, and thanks for sharing the code. Meanwhile, could you please also share the training hyper-parameters (e.g., learning rate, optimizer, warmup lr, warmup epochs, etc.)? I would really like to train the model to get a deeper understanding of the model.

Thanks, Steve

May 12 '21 22:05 stevehuanghe

Hi, thanks taking interest in this work.
The training hyper-parameters are (for stam_16) batch size 64, AdamW optimizer with weight decay 1e-3, 100 epochs with cosine annealing schedule and learning rate warm up (first 10 epochs). Base learning rate of 1e-5. And using model EMA. For stam_64, same as above, except batch size: 16, and learning rate: 2.5e-6 The models were trained on single 8xV100 machine. Hope you find this useful.

May 13 '21 08:05 giladsharir

您好，感谢您对这项工作感兴趣。训练超参数（对于 stam_16）批量大小为 64，权重衰减为 1e-3 的 AdamW 优化器，具有余弦退火计划和学习率预热的 100 个时期（前 10 个时期）。基础学习率为 1e-5。并使用模型 EMA。对于 stam_64，与上面相同，除了批量大小：16，学习率：2.5e-6 模型是在单个 8xV100 机器上训练的。希望您觉得这个有帮助。

Could you please share the training code? thanks

Nov 13 '22 14:11 yeboqxc

STAM STAM copied to clipboard

Could you please share training hyper-parameters?

STAM
STAM copied to clipboard