estool
estool copied to clipboard
Natural gradients for deep layers
In NES algorithms do we use backpropogation from the last layers gradients(computed by the objectve function). I am curious as to how to optimize the hidden layers since they are not directly affecting the objective function
Evolution strategies do not use backpropagation, but they differentiate (approximately) across a population of solutions, rather than across parameters within a solution. Perhaps reading this blog post is useful :stuck_out_tongue: