sgd
sgd copied to clipboard
An R package for large scale estimation with stochastic gradient descent
Ideas - Run chi-squared test sequentially after a batch of iterations to check convergence. This can also be used as a way to stop SGD early rather than running all...
should be as fast as possible. for now do grid search. possible extension: - implementation using bayesian optimization, c.f., ryan adam's work
Why is the square root in AdaGrad empirically getting better performance? ... or is it? To be analyzed!
I want to try `sgd` package - seems it provides a lot of options and must-have features. But why it didn't work with sparse matrices (`Matrix` package, especially `dgCMatrix` class,...
E.g., default is uniform draws, another you can specify a probability vector of dimension N in order to assign weights to do a multinomial draw, another does importance sampling/active learning,...
Randomly initialize at 0 with normally distributed epsilon, say, `eps=1e-5` standard deviation. See #58. To tune hyperparameters: run SGD to get best estimates for a particular choice of hyperparameters. Then...
These must allow one to specify multiple sgd objects to plot. - [x] MSE - [x] Classification error - [ ] Evaluation of cost function available x-axis for each of...
I believe the user should have the following options for the learning rate. - [ ] Manual: Should be possible to set the learning rate manually - [ ] Auto-1dim:...