Evgenii Shalnov

Results 5 comments of Evgenii Shalnov

I found that the problem is in the initial 5 steps. The problem is in the code: ``` if N_sma >= 5: step_size = math.sqrt((1 - beta2_t) * (N_sma -...

I got it. You are right. So at initialization stage the RAdam works exactly like SGD with momentum. Nevertheless, It seems that optimal learning rates for SGD with momentum and...

I found that the problem is in gradient values. The gradients w.r.t. some of parameters are bigger than 1. Thus on the first iteration of training SDGW just multiplies them...

Yes, your are right. In the first iteration SGDM makes step equal to learning_rate * gradient. In my particular case some values of gradient are >> 1. Thus SGDM makes...

Hello, I think [this](https://github.com/uzh-rpg/rpg_e2vid/pull/24) Pull request should help you with the issue.