dnn_opt
dnn_opt copied to clipboard
Stochastic Gradient Descent algorithm
Gradient based algorithms are the default training algorithms for ANN. Hence, providing support for such algorithms (SGD, ADAM, RMSProp, etc.) is critical in order to provide out of the box benchmarking capabilities. Our suggestion is to proceed as follows:
-
[ ] Create a new class in
base
package (e.g.class derivable : public solution {}
that will inherit fromsolution class
).derivable
should have an array of floats (e.g.derivable::m_df
) that represents the derivative of thesolution::fitness()
function respect each parameter insolution::get_params()
. Hence,derivable::get_df()
size will besolution::size()
. -
[ ] Define a getter in
derivable
class (e.g.derivable:df()
) that, in case the solution was modified will calculate the derivative of the fitness function and store the result inderivable::m_df
array. In case the solution is not modified, it will simply returnderivable::m_df
. The implementation of this method should be the same as in the currentsolution::fitness()
method. -
[ ] As in the case of
solution::fitness()
andsolution::calculate_fitness()
consider an implementation ofderivable:df()
and a protectedvirtual derivable:calculate_df() = 0
method inderivable
. It is probably a good idea to do not createderivable::m_df
before the firstderivable:df()
call just in case the derivative gets never used. -
[ ] Each child of
derivable
insolutions
package should re-implement its own version ofvirtual derivable:calculate_df() = 0
according to the fitness function (only if the fitness function is derivable of course). This means that thenetwork
class should inherit fromderivable
instead ofsolution
and implementvirtual derivable::calculate_df() = 0
. -
[ ] The
network:calculate_df()
implementation will call alayer::backprop()
method defined in the layer class, passing the position in thederivable::m_df
array where the layer will store the derivative of its corresponding parameters.layer::backprop()
method should be similar to the currentlayer::prop()
method. -
[ ] Each child of
layer
in packagelayers
should re-implement its own version ofvirtual layer:backprop() = 0
. Currently there should be a single layerfc
(fully connected layers) implemented in the library. -
[ ] Create a new class in
algorithms
packageclass sgd : public algorithm
that using the derivative and the fitness function of aderivable
solution can implement Stochastic Gradient Descent Algorithm.