Nicolas Pinto
Nicolas Pinto
feedback should be restored (btw, it was used in LeCun's NIPS'11 optimization challenge)
The constant η0 is determined by performing preliminary experiments on a data subsample. http://leon.bottou.org/projects/sgd We could also have a `asgd.tune_...()` methods to "tune" speed and accuracy (here step_size0 would be...
see sparsity trick from bottou
The learning rate has the form η0 / (1 + λ η0 t)^0.75 where λ is the regularization constant. See: http://leon.bottou.org/projects/sgd
"sphere" the data and merge in the weights
The idea is to boost the performance by "disabling" the averaging until it gets useful. start with exp_moving_asgd_step_size=1e-2 ?
Multiple (sgd_step_size0, l2_regularization) could be given and `*fit()` methods could use BLAS Level-3 operations when appropriate to allow for more data re-use and speed up the computation. This is confusing......
To decrease communication and speed up convergence, we should have an option (default=True) to only update weights when margin constraints have been violated: e.g.: Line #66 should move up (to...
It would be useful to have the possibility of using "mini_batches" to get better estimations of the gradients (see https://github.com/npinto/asgd/blob/master/asgd/naive_asgd.py#L60). Since we'll be using BLAS, etc. this parameter could possibly...