NLopt.jl icon indicating copy to clipboard operation
NLopt.jl copied to clipboard

Easier auto-differentiation support?

Open jlperla opened this issue 6 years ago • 12 comments

Right now you need to manually setup auto-differentiation for functions which is more cumbersome than others (e.g. Optim or JuMP directly). Would you be willing to entertain a PR to make this a little easier to work with?

You can see https://github.com/ubcecon/computing_and_datascience/blob/master/julia_tutorials/nlopt/nlopt-tutorial.ipynb for an example of what it might look like. Basically, you could just call an adapter on functions and pass that object directly into your existing functions.

The downside is that this would require adding https://github.com/JuliaNLSolvers/NLSolversBase.jl as a dependency with the following sub-dependencies: https://github.com/JuliaNLSolvers/NLSolversBase.jl/blob/master/REQUIRE

jlperla avatar Feb 28 '19 18:02 jlperla

Another idea to throw out there. We have an abstraction for NLP solvers in MathOptInterface with multiple backends, e.g., Ipopt and Knitro. (NLopt has a wrapper for the old MathProgBase API but not yet MathOptInterface.) If you target your adapters to MathOptInterface instead of NLopt directly then you get multiple backends without much extra cost.

mlubin avatar Mar 01 '19 03:03 mlubin

@mlubin I really hope that NLopt gets up with the MOI soon so that I can swap out Ipopt when using juMP

Maybe I am misunderstanding, but if you are suggesting that given MOI we would instead use it directly for calling the optimizer, I am not sure I see it as it currently stands. The complexity of MOI is necessary to drive a EDSL like JuMP, but it is very heavy for those of us that just want to call a nonlinear solver and pass in arbitrary functions of vectors. Perhaps JuMP will become a little more friendly for those who want nonlinear solvers without a DSL, or perhaps there is another interface built on MOI for people like me?

Just so you know, the total code required to add Optim-style AD support to NLsolve is just

using NLSolversBase
struct NLoptAdapter{T} <: Function where T <: AbstractObjective
    nlsolver_base::T
end

# implement fg!; note that the order is reversed
(adapter::NLoptAdapter)(x, df) = adapter.nlsolver_base.fdf(df, x)
(adapter::NLoptAdapter)(result, x, jacobian_transpose) = adapter.nlsolver_base.fdf(result, jacobian_transpose', x)

# constructors
NLoptAdapter(f, x, autodiff = :forward) = NLoptAdapter(OnceDifferentiable(f, x, autodiff = autodiff))
NLoptAdapter(f!, x::Vector, F::Vector, autodiff = :forward) = NLoptAdapter(OnceDifferentiable(f!, x, F, autodiff = autodiff))

jlperla avatar Mar 01 '19 06:03 jlperla

Something along these lines seems reasonable to me.

stevengj avatar Mar 01 '19 12:03 stevengj

Great @arnavs will prepare a full PR for this with the necessary tests, documentation, etc.

At this point, we think that this will require no direct changes to the NLopt code itself (though you may have ideas of how it could be reorganized). By using NLSolversBase as a dependency, we may be able to hook in other AD implementations if they are added to that library (@arnavs may also try to add Flux.jl or Zygote.jl support to NLSolversBase in a separate issue).

(1) We don't add in exported wrapper type and instead add in arguments to the functions themselves? i.e. add a method shadowing the existing ones so it would look like:

# A setup from our existing tests
M = 5000
x0 = fill(1.0, M)
m = range(2, 5, length = M) 
f(x) = sum((x .- m).^2) + sum(log.(x))

# define the optimization problem
opt = Opt(:LD_LBFGS, M)
min_objective!(opt, f, autodiff=:forward)

The list of additional methods would be something like

min_objective!(opt::Opt, f::Function, autodiff)
max_objective!(opt::Opt, f::Function, autodiff)
inequality_constraint!(opt::Opt, fc, tol=0, autodiff)
equality_constraint!(opt::Opt, h, tol=0, autodiff)
inequality_constraint!(opt::Opt, c, tol::AbstractVector, autodiff)
equality_constraint!(opt::Opt, c, tol::AbstractVector, autodiff)

where autodiff would support the NLSolversBase symbols in https://github.com/JuliaNLSolvers/NLSolversBase.jl/blob/master/src/NLSolversBase.jl#L55 (i.e. :forward, :finite or :finiteforward, I think?).

(2) Alternatively, the less invasive one is to have a pubically exposed wrapper:

# A setup from our existing tests
M = 5000
x0 = fill(1.0, M)
m = range(2, 5, length = M) 
f(x) = sum((x .- m).^2) + sum(log.(x))

# define the optimization problem
opt = Opt(:LD_LBFGS, M)
f_opt = NLoptDifferentiable(f, x0) # The only new code!
min_objective!(opt, f_opt)

For the name of the adapter:

  • Optim.jl has some algorithms which use hessians, and hence distinguishing between OnceDifferentiable and TwiceDifferentiable is necessary. It didn't appear that the Julia interface supported any algorithms with hessians in the objective. Did I miss anything?
  • If not, then NLoptDifferentiable or even Differentiable would work (we don't want to collide with NLSolversBase.OnceDifferentiable

An alternative is that we don't add in exported wrapper type and instead add in arguments to the functions themselves? i.e. add a method


@stevengj If you have no strong opinions, I think we should first try to add it directly in the interfaces on those functions instead of wrappers? That would be the easiest for end-users.

jlperla avatar Mar 02 '19 05:03 jlperla

@arnavs: A few notes:

  • If we try to make new methods for the min_objective!, etc. , we will need to deal with the fact that we cannot construct a NLSolversBase.OnceDifferentiable without having a vector of the right type to pass in. OnceDifferentiable uses it to create a cache, so it is the type that matters rather than the value within the vector
  • For this, I think you can query the opt::Opt for the dimension of the objective that was proposed and create a zeros(Float64...) to pass in. The tol::AbstractVector for the vector inequality and equality constraints should be useful for determining that Jacobian, and the univariate inequality and equality constraints should not be too difficult to figure out.
  • The code for creating the min_objective! and max_objective! is in this code: https://github.com/JuliaOpt/NLopt.jl/blob/master/src/NLopt.jl#L426 Which is basically a proxy to the ccall in the underlying library. You could create the min_objective! etc. which create and call these outside of this sort of fancy @eval and then just call these existing methods
  • For the unit tests, it is essential to try out all of the variations on the finite-differences and forward differences for the univariate objective, multivariate objective, univariate equality constraint, univariate inequality constraint, multivariate equality, and multivariate inequality. You can construct artifically simple optimization problems for those tests.

jlperla avatar Mar 02 '19 05:03 jlperla

This thread is moving in the direction of adding direct support for AD to NLopt which is fine with me. But anyway to address the point on MOI,

Maybe I am misunderstanding, but if you are suggesting that given MOI we would instead use it directly for calling the optimizer, I am not sure I see it as it currently stands. The complexity of MOI is necessary to drive a EDSL like JuMP, but it is very heavy for those of us that just want to call a nonlinear solver and pass in arbitrary functions of vectors.

Yes, I was suggesting to call the optimizers via MOI. Yes, there are a few more lines of code required, but if you're already in the business of designing APIs for nonlinear optimization (as discussed in this thread), then I don't expect it to be a significant challenge. There are a number of people familiar with the MOI API who can answer questions at https://gitter.im/JuliaOpt/JuMP-dev. I also have an old demo notebook of connecting Newton's method to MathProgBase: https://nbviewer.jupyter.org/github/JuliaOpt/juliaopt-notebooks/blob/master/notebooks/MathProgBase%20Newton%20solver.ipynb. The API for interacting with derivatives is mostly unchanged from MPB to MOI.

The main extra things that MOI supports over the Optim/NLopt-type interfaces are:

  • Sparse hessian matrices
  • Separate handling of linear, quadratic, and other classes of constraints.

I understand that a lot of use cases for nonlinear optimization don't need to think about sparse hessian matrices or linear constraints. Ultimately it's up to you to decide if the extra complexity of the abstraction is worth it.

Perhaps JuMP will become a little more friendly for those who want nonlinear solvers without a DSL, or perhaps there is another interface built on MOI for people like me?

The JuliaSmoothOptimizers organization has built a good amount of infrastructure that could be useful. As far as I know it hasn't been updated for MOI yet though. In any case, the best way for a simpler NLP interface to be built on top of MOI is for someone invested in the result to write the code. It's a question of a couple days, not weeks or months.

mlubin avatar Mar 02 '19 15:03 mlubin

but if you're already in the business of designing APIs for nonlinear optimization (as discussed in this thread), That is close to the last thing that I want to do. My only goal here is to hack support into NLopt for AD to get through until there are interfaces that people like me want to use. To me, the gold standard is https://tomopt.com/tomlab/ : pass in a function for the constraints, objectives, matrices (or matrix free operators at some point) for the linear constraints, and settings... essentially DSL-free.

We can discuss further in slack to see if a NLP interface based on MOI is possible and if there are people who could help design it. I am not sure I have access to people with sufficient expertise (BTW, you had previously mentioned JuliaSmoothOptimizers to me, but I never was able to navigate it to see the interface they were proposing).

jlperla avatar Mar 03 '19 19:03 jlperla

Rather than min_objective!(opt, f, autodiff=:forward), I would suggest something like min_objective!(opt, autodiff(f)), where autodiff(f) takes a function f(x) and returns an (x, grad) -> ... function of the form expected by nlopt.

Then you are just adding a new autodiff function and don't need to modify the existing functions at all. In principle, the autodiff function (or whatever it is called) could even be in an a separate package, e.g. in NLSolversBase.

stevengj avatar Mar 06 '19 18:03 stevengj

something along these lines? (i use this to add forward differenciation support in my code)

function nlopt_form(f,x,g,diffresult_cache)
    if length(g) > 0 
        ForwardDiff.gradient!(diffresult_cache,f,x)
        g .= DiffResults.gradient(diffresult_cache)
        return DiffResults.value(diffresult_cache)
    else
        return f(x)
    end
end

using that as a base,

function autodiff(f)
    res =(x,grad=[]) ->begin
        if length(grad) > 0 
            diffresult_cache = DiffResults.GradientResult(x) 
            ForwardDiff.gradient!(diffresult_cache,f,x)
            grad .= DiffResults.gradient(diffresult_cache)
            return DiffResults.value(diffresult_cache)
        else
            return f(x)
        end
    end
    return res 
    end

the problem to adding this to NLsolversBase is that the check for the existence of the gradient is different. here checks for length, Optim checks for nothing)

longemen3000 avatar Sep 02 '19 16:09 longemen3000

an improved version, works with the future NLsolvers and NLopt

function autodiff(fn)   
    function f(x) 
        return fn(x) 
    end
    
    function f(∇f, x)
        if !(∇f == nothing) && (length(∇f) != 0)
            ForwardDiff.gradient!(∇f,fn,x)
        end
    
        fx = f(x)
        return ∇f == nothing ? fx : (fx, ∇f)
    end

    function f(∇²f, ∇f, x)
        if !(∇²f == nothing)
            ForwardDiff.hessian!(∇f,fn,x)
        end
    
        if (∇f == nothing) && (length(∇f) == 0) && (∇²f == nothing)
            fx = f(∇f, x)
            return fx
        elseif ∇²f == nothing
            return fx = f(∇f, x)
        else
            fx, ∇f = fx = f(∇f, x)
            return fx, ∇f, ∇²f
        end
    end
    return f
end

longemen3000 avatar Sep 08 '19 07:09 longemen3000

Just as an update here, I found the above to not be working (perhaps it did when posted). Here is what I found to work:

function autodiff(fn)
# adapted from here https://github.com/JuliaOpt/NLopt.jl/issues/128
	function f(x) 
		return fn(x) 
	end

	function f(x,∇f)
		if !(∇f == nothing) && (length(∇f) != 0)
			ForwardDiff.gradient!(∇f,fn,x)
		end
	
		fx = fn(x)
		return fx
	end
	return f
end

tbeason avatar Feb 15 '22 16:02 tbeason

I use this in practice:

import AbstractDifferentiation as AD

function gradient_on_demand(ad_backend, f)
    function(x, ∇fx)
        if isempty(∇fx)
            f(x)
        else
            v, (g,) = AD.value_and_gradient(ad_backend, f, x)
            copy!(∇fx, g)
            v
        end
    end
end

Example:

using NLopt, ForwardDiff
ab = AD.ForwardDiffBackend()
opt = Opt(:LD_SLSQP, 3)
equality_constraint!(opt, gradient_on_demand(ab, x -> sum(x) - 3), 1e-4)
opt.min_objective = gradient_on_demand(ab, x -> sum(abs2, x))
(minf, minx, ret) = optimize(opt, [-1.0, 1.5, 9.9])

Would be happy to make a PR, AbstractDifferentiation is a not a heavy dependency (per se). And Enzyme support is WIP.

tpapp avatar Oct 10 '23 09:10 tpapp