Paper-Implementation-Overview-Gradient-Descent-Optimization-Sebastian-Ruder
                                
                                 Paper-Implementation-Overview-Gradient-Descent-Optimization-Sebastian-Ruder copied to clipboard
                                
                                    Paper-Implementation-Overview-Gradient-Descent-Optimization-Sebastian-Ruder copied to clipboard
                            
                            
                            
                        [Python] [arXiv/cs] Paper "An Overview of Gradient Descent Optimization Algorithms" by Sebastian Ruder
Paper-Implementation-Overview-Gradient-Descent-Optimization-Algorithms
arXiv paper :
An Overview of Gradient Descent Optimization Algorithms - Sebastian Ruder
Python 2.7
Links to Original Paper published on arXiv.org>cs>arXiv:1609.04747 : [1], [2]
Link to Blog with Paper Explanation : [3]
Implemented following Gradient Desent Optimization Algorithms from Scratch:
- 
Vanilla Batch/Stochastic Gradient Descent [4] : batch_gradient_descent.py 
- 
Momentum [5] : momentum.py 
- 
NAG : Nesterov Accelarated Gradient [6] : nesterov_accelarated_gradient.py 
- 
AdaGrad : Adaptive Gradient Algorithm [7] : adagrad.py 
- 
AdaDelta : Adaptive Learning Rate Method [8] : adadelta.py 
- 
RMS Prop [9] : rms_prop.py 
- 
AdaMax : Infinity Norm based Adaptive Moment Estimation [12] : adamax.py 
- 
Nadam : Nesterov-accelarated Adaptive Moment Estimation [13] : nadam.py 
- 
AMSGrad [14] : amsgrad.py 
Time and Error Analysis :
Minimized dummy Cost Function f(x) = x^2 using default values as initial approximation = 1, error tolerance = 0.0001, learning rate = 0.1, gamma = 0.9, beta_1 = 0.9, beta_2 = 0.999

