Liyuan Liu comments

Results 27 comments of


                                            Liyuan Liu

trafficstars

Sensitivity wrt LR restarts

Thanks for bringing this up. In our analysis & experiments, we haven't try any learning rate restarts. I agree this issue may due to numerical instability or algorithm design. Will...

Also, RAdam didn't obviate all needs for warmup : ( we found in some cases, adding additional warmup gets a better performance (some discussions are put at: https://github.com/LiyuanLucasLiu/RAdam#questions-and-discussions).

Sensitivity wrt LR restarts

@e-sha the `step_size` here is not the learning rate, but more like the step size ratio. When `N_sma` < 5, the adaptive learning rate will be turned off, and `step_size`...

Sensitivity wrt LR restarts

Thanks for letting us know @e-sha, can you provide a full script to reproduce the result? I'm not sure why `RAdam` behaves in this way. Intuitively, SGDM should be more...

Sensitivity wrt LR restarts

@e-sha Thanks for letting us know : -) I guess you mean the problem is in parameter values? or gradient values? I think in the first iteration SGDM, although with...

Sensitivity wrt LR restarts

I see, I understand it know, thanks for sharing. BTW, people find that using the gradient clip also helps to stabilize the model training.

Sensitivity wrt LR restarts

@e-sha I added an option to decide whether to use sgdm https://github.com/LiyuanLucasLiu/RAdam/commit/373b3e405c7f8d24fe068aee0472e5c3ae231cdc

NaNs

Thanks for reaching out. I haven't observed this and I'm wondering whether you can provide a simple setup to reproduce this phenomenon. BTW, there is a known issue that can...

RAdam for pytorch official

Hi @Tony-Y, I'm curious why you prefer to use Adam with a warmup instead of RAdam. I think the very basic fact both papers agree on, is that it's necessary...

TypeError: must be real number, not NoneType

Hi, it helps to provide a script / setting to reproduce the error.