MOE
MOE copied to clipboard
[C++] implement polyak-ruppert averaging for gradient descent
C++'s gradient descent code currently does not support any kind of averaging.
We should implement polyak-ruppert averaging. This is already done in moe.optimal_learning.python.python_version.optimization.GradientDescentDescentOptimizer.optimize
so porting it should be straightfoward.
This hasn't proven to be much of a hindrance insofar as the results obtained in Python with/without averaging have been comparable (i.e., the final gradient hasn't bee much better either way). Still we should be consistent and this averaging is generally a good idea.