Ranger-Deep-Learning-Optimizer issues

Benchmarck Adaptive Scheduling of Stochastic Gradients

https://paperswithcode.com/paper/adas-adaptive-scheduling-of-stochastic Could it beat rangerLars?

Did you try to fine-tune transformers LM with Ranger?

4

Recent transformers architectures are very famous in NLP: BERT, GPT-2, RoBERTa, XLNET. Did you try to fine-tune them on some NLP task? If so, what was the best Ranger hyper-parameters...

avostryakov

It makes sense to use it on a batch of 1?

3

@lessw2020 Thanks for this awesome optimizer. I´m very excited about it! There is one particular workload that trains using a batch of 1 item. Theoretically, make sense to use RAdam...

bratao

Add manual synchronization function

2

Hello. First of all, thank you for sharing code and experiment results. Reading the code, I found that the model will use fast weights to infer. According to LookAhead, fast...

qbx2

e-sha

Keras implementation

1

It would be very helpful if you could provide implementation in keras.

jetjodh

Ranger-Deep-Learning-Optimizer
Ranger-Deep-Learning-Optimizer copied to clipboard

Metadata

Benchmarck Adaptive Scheduling of Stochastic Gradients

Did you try to fine-tune transformers LM with Ranger?

It makes sense to use it on a batch of 1?

Add manual synchronization function

Ranger and pytorch DDP

Loading state doesn't seem to be fully working

larger learning rate + large weight decay performs better?

Too huge step_size at initialization stage

Keras implementation

← Metadata

Owner

Metadata

Ranger-Deep-Learning-Optimizer Ranger-Deep-Learning-Optimizer copied to clipboard

Metadata

← Metadata

Owner

Metadata

Ranger-Deep-Learning-Optimizer
Ranger-Deep-Learning-Optimizer copied to clipboard