Knet.jl icon indicating copy to clipboard operation
Knet.jl copied to clipboard

LBFGS optimizer

Open qzhu2017 opened this issue 3 years ago • 3 comments

Although I know most of people use ADAM or SGD for the optimization of weight in NN. However, for some small NN architecture, the optimization method with line search (e.g., LBFGS) would be far more efficient. I wonder if the Knet developers would be interested in implementing LBFGS?

qzhu2017 avatar Aug 06 '20 06:08 qzhu2017

Optim.jl may already have this.

On Thu, Aug 6, 2020 at 9:33 AM Qiang Zhu [email protected] wrote:

Although I know most of people use ADAM or SGD for the optimization of weight in NN. However, for some small NN architecture, the optimization method with line search (e.g., LBFGS) would be far more efficient. I wonder if the Knet developers would be interested in implementing LBFGS?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/590, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN43JRAGCZDYPYGAQUKDL3R7JFCBANCNFSM4PWH6W5A .

denizyuret avatar Aug 06 '20 06:08 denizyuret

Yes, I know. Can I using it with Kent? I tried it. But this is not very obvious. To be more specific, optim.jl needs to know the following to optimize a function of f(x).

  • x
  • f
  • g

Both x and g need to be converted to a sort of 1D array. However, the main trouble is g. My student and I tried to transform g to something which can be accepted by optim.jl. But we failed. Not sure if someone has such experience.

qzhu2017 avatar Aug 06 '20 06:08 qzhu2017

x = Param(Array(...))
J = @diff f(x)
g = grad(J,x)

Should give you a g with the exact type/shape as x. See @doc AutoGrad.

denizyuret avatar Aug 19 '20 12:08 denizyuret