Ranger-Deep-Learning-Optimizer It makes sense to use it on a batch of 1?

It makes sense to use it on a batch of 1?

Open bratao opened this issue 4 years ago • 3 comments

@lessw2020 Thanks for this awesome optimizer. I´m very excited about it!

There is one particular workload that trains using a batch of 1 item. Theoretically, make sense to use RAdam (Rectified Adam), LookAhead, and GC in this context?

I´m thinking about it, read the papers but I still could not make a conclusion. As you (or any other person here) is much more experienced than me, do you have an option on this?

Apr 12 '20 17:04 bratao

Hi @bratao - it would still make sense to use, but my recommendation is to run with MABN - moving average batch norm. This creates a moving average across batches and they show for example a batch size of 2 can get same accuracy as batch size 32, vs normally there is a large drop. I am planning to test it out this week so I don't have proof it works yet but paper looks strong and idea is solid.
https://arxiv.org/abs/2001.06838

Apr 12 '20 20:04 lessw2020

Their code is linked there though it needs to likely be extractedout of their framework as I recall.
Anyway it's on my todo list and maybe can pull it out and make it a pluggable item. Regardless that is the best way imo to address the batch size 1 issue Hope that helps! I'll leave this open to use to track my testing results on mabn and please post if you use it before I get to it :)

Apr 12 '20 20:04 lessw2020

@lessw2020 I know that I´m just a beggar, but the first thing I do every morning is open this issue to check if you got to MABN.

Good vibes from an anxious fan ☮️

Apr 24 '20 17:04 bratao

Ranger-Deep-Learning-Optimizer Ranger-Deep-Learning-Optimizer copied to clipboard

It makes sense to use it on a batch of 1?

Ranger-Deep-Learning-Optimizer
Ranger-Deep-Learning-Optimizer copied to clipboard