Ranger-Deep-Learning-Optimizer
Ranger-Deep-Learning-Optimizer copied to clipboard
It makes sense to use it on a batch of 1?
@lessw2020 Thanks for this awesome optimizer. I´m very excited about it!
There is one particular workload that trains using a batch of 1 item. Theoretically, make sense to use RAdam (Rectified Adam), LookAhead, and GC in this context?
I´m thinking about it, read the papers but I still could not make a conclusion. As you (or any other person here) is much more experienced than me, do you have an option on this?
Hi @bratao - it would still make sense to use, but my recommendation is to run with MABN - moving average batch norm.
This creates a moving average across batches and they show for example a batch size of 2 can get same accuracy as batch size 32, vs normally there is a large drop.
I am planning to test it out this week so I don't have proof it works yet but paper looks strong and idea is solid.
https://arxiv.org/abs/2001.06838
Their code is linked there though it needs to likely be extractedout of their framework as I recall.
Anyway it's on my todo list and maybe can pull it out and make it a pluggable item.
Regardless that is the best way imo to address the batch size 1 issue
Hope that helps!
I'll leave this open to use to track my testing results on mabn and please post if you use it before I get to it :)
@lessw2020 I know that I´m just a beggar, but the first thing I do every morning is open this issue to check if you got to MABN.
Good vibes from an anxious fan ☮️