TensorFlow.jl icon indicating copy to clipboard operation
TensorFlow.jl copied to clipboard

Writing custom optimizer

Open askrix opened this issue 7 years ago • 5 comments

Mr. Malmaud and Mr. White,

in the actual version of "TensorFlow.jl" v0.10.2 are the two optimizers provided train.AdamOptimizer() and train.GradientDescentOptimizer(). On the other hand, TensorFlow's API provides the list of eight additional optimizers. In the end September this year emerged the publication (I will provide the link) which shows that ADAM doesn't always converge compared to SGD. For my project I'd like to implement SGD. The way to do so was discussed on Stackoverflow. Furthermore, I found this article which describes in a great detail how to implement a custom optimizer for TensorFlow in Python. My question is if it's possible to do so in Julia using "TensorFlow.jl" as well?

Many thanks in advance and best regards, Askrix

askrix avatar Oct 02 '18 14:10 askrix

Yes, it is possible.

See the code in https://github.com/malmaud/TensorFlow.jl/blob/master/src/train.jl

If you do implement a cool new (or old) optimize feel encouraged to make a PR.

oxinabox avatar Oct 02 '18 14:10 oxinabox

Mr. White,

here is the link to the article I wrote about the last week.

Best regards, Askrix

askrix avatar Oct 08 '18 11:10 askrix

Mr. White,

I chose Himmelblau's function from the list of Test functions for optimization and implemented simple gradient, momentum and NAG (Nesterov accelerated gradient). After testing I chose the NAG and following the source code from the link you provided tried to implement the algorithm for "TensorFlow.jl". Regarding the implementation I've the following question. Basically NAG consist of two steps ν = γ*ν + η*Δf(θ - γ*ν); θ = θ - ν; i don't understand how I can enforce "grad" to evaluate with θ - γ*ν as input instead of θ. Secondly, how I can create a pull request properly? What is supposed to be "base:" and "compare:"?

Best regards, Askrix

askrix avatar Oct 11 '18 13:10 askrix

i don't understand how I can enforce "grad" to evaluate with θ - γ*ν as input instead of θ.

I am not sure what the best way to achieve that is.

I don't really know much about NAG, but gradients(f, x) is how you find the gradient of f with respect to x

oxinabox avatar Oct 12 '18 07:10 oxinabox

Mr. White,

thank your for the answer. I'll try it out. For now I created a new pull request for my implementations of "Nadam" and "AMSGrad".

Best regards, Askrix

askrix avatar Oct 12 '18 12:10 askrix