Ranger-Deep-Learning-Optimizer icon indicating copy to clipboard operation
Ranger-Deep-Learning-Optimizer copied to clipboard

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Results 19 Ranger-Deep-Learning-Optimizer issues
Sort by recently updated
recently updated
newest added

https://paperswithcode.com/paper/adas-adaptive-scheduling-of-stochastic Could it beat rangerLars?

Recent transformers architectures are very famous in NLP: BERT, GPT-2, RoBERTa, XLNET. Did you try to fine-tune them on some NLP task? If so, what was the best Ranger hyper-parameters...

@lessw2020 Thanks for this awesome optimizer. I´m very excited about it! There is one particular workload that trains using a batch of 1 item. Theoretically, make sense to use RAdam...

Hello. First of all, thank you for sharing code and experiment results. Reading the code, I found that the model will use fast weights to infer. According to LookAhead, fast...

I tried ranger vs adamw on single and 8 gpu setup, while ranger better on single gpu, on DDP setup it performe worse, any advises?

To save : 'optimizer' : optimizer.state_dict() optimizer.load_state_dict(checkpoint['optimizer']) However, I have the impression restarting the training always bring the accuracy down and then it recovers. Best, Thomas Chaton>

Hi all, My colleague and I tried a combination of (relatively) large Ranger learning rate (say, 0.001) + large weight decay (say, 0.1). Seems the large decay leads to better...

I found that step_size is too high in the initial 5 steps. The problem is in the code: ``` if N_sma >= self.N_sma_threshhold: step_size = math.sqrt((1 - beta2_t) * (N_sma...

It would be very helpful if you could provide implementation in keras.