Ranger-Deep-Learning-Optimizer
Ranger-Deep-Learning-Optimizer copied to clipboard
Too huge step_size at initialization stage
I found that step_size is too high in the initial 5 steps. The problem is in the code:
if N_sma >= self.N_sma_threshhold:
step_size = math.sqrt((1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (N_sma_max - 2)) / (1 - beta1 ** state['step'])
else:
step_size = 1.0 / (1 - beta1 ** state['step'])
If betas are set to (0.9, 0.999) the internal variables are changed as following:
state['step']| step_size
------------------------------
1 | 10
2 |5.26315789
3 |3.6900369
4 |2.90782204
5 |2.44194281
6 |0.00426327
7 |0.00524248
8 |0.00607304
9 |0.00681674
10 |0.00750596
Note, that step_size doesn't depend on gradient value and it scales learning_rate. Thus RAdam aggressively moves weights from their initial values, even if they have a good initialization.
Is it better to set step_size equal to 0 if N_sma < self.N_sma_threshhold?
Hi @e-sha - thanks for pointing this out! Offhand, yes, it looks like 0 would be a better result but will need to test and see. Can you test it if you have time today? I will try and test it later this evening and then can update if that appears to be the best option (which it appears to be). I have some other work from a couple other optimizers that might be better than 0 for first five but won't have time to test that until later (see RangerQH for example). Thanks!