Optimization.jl
Optimization.jl copied to clipboard
[WIP] Add Manopt.jl wrapper
Aimed at moving the already substantial work done in #437 forward
Checklist
- [ ] Appropriate tests were added
- [ ] Any code changes were done in a way that does not break public API
- [ ] All documentation related to code changes were updated
- [ ] The new code follows the contributor guidelines, in particular the SciML Style Guide and COLPRAC.
- [ ] Any new documentation only uses public API
Additional context
Add any other context about the problem here.
In addition to the existing implementation picked from the PR, it is desired to support an interface to specify the Manifold as a constraint or metadata of the optimization variable especially from the MTK interface.
But maybe the solver approach here makes more sense, need to think more about this
Hi, thanks for taking over (and sorry for being so inactive on the other one). I can try to comment a bit more on the current state here, but just one comment on the constraint comment.
I prefer to think of the manifold as a domain, not a constraint. The reason is, that the algorithms in Manopt work intrinsically, without having any embedding (where the manifold would be the constraint) in mind: They require an inner product on the tangent spaces, a vector transport (sometimes) and a retraction (maybe its inverse as well) but not a projection (for example for SPD matrices, such a projection does not even exist, since that set is open in its embedding).
The JuMP extension Benôit was so nice to program, does this in a nice way I think, see https://manoptjl.org/stable/extensions/#JuMP.jl
They use @variable(model, U[1:2, 1:2] in Stiefel(2, 2), start = 1.0)
so specify that the variable p in M
specifies the domain of p
to be the manifold M
.
But maybe the solver approach here makes more sense, need to think more about this
In theory, manifold is clearly a part of the problem definition, not a solver. As Ronny wrote, it may be even considered a domain and not a constraint, similarly to how boolean and integer parameters are usually not handled as constrains on real values but a thing on its own. In those cases there is overwhelmingly clear evidence that this is the way to go but IMO in the case of manifolds it's still an open problem.
One interesting benefit of treating them like domains is that manifold optimizers like Manopt.jl minimize the need to flatten the structure of variables and can naturally support tangents of different types than optimized variable. I'm not sure how well known that is but "you don't have to flatten" would be my best argument at this time for considering manifolds a domain.
In practice, if you only ever use manifolds with Manopt.jl, it would likely be easier to make them a part of solver.
With the symbolic interface (through MTK) we should be able to support something similar to the domain definition, to me that seems most intuitive as well. But I can see that be a bit cumbersome when you have to define multiple variables and since all of them have to usually belong to the same manifold this solver approach does turn out to be easier from both the user and development point of view, so I'd like to support both
For multiple variables we have the power manifold (even with the shortcut M^3
for manufiold-valued vectors or M^(3,4)
for a metric of variables on a manifold.
But also Product manifolds are available (using something like M×N
so overloading the ×
operator). This should make multiple variables easier – and already exists.
The newest test already looks quite nice, just that philosophically I would prefer to have the manifold in the objective not the solver.
I completely agree @kellertuer but I can't think of a solution to this right now. In a functional interface it is pretty hard to find a membership syntax. If you think about it this is pretty similar to how the Manopt.jl's interface looks as well, so it shouldn't be very unintuitive.
M = SymmetricPositiveDefinite(5)
m = 100
σ = 0.005
q = Matrix{Float64}(I, 5, 5) .+ 2.0
data2 = [exp(M, q, σ * rand(M; vector_at=q)) for i in 1:m];
f(M, x, p = nothing) = sum(distance(M, x, data2[i])^2 for i in 1:m)
f(x, p = nothing) = sum(distance(M, x, data2[i])^2 for i in 1:m)
optf = OptimizationFunction(f, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf, data2[1])
opt = OptimizationManopt.GradientDescentOptimizer(M)
@time sol = Optimization.solve(prob, opt)
vs
n = 100
σ = π / 8
M = Sphere(2)
p = 1 / sqrt(2) * [1.0, 0.0, 1.0]
data = [exp(M, p, σ * rand(M; vector_at=p)) for i in 1:n];
f(M, p) = sum(1 / (2 * n) * distance.(Ref(M), Ref(p), data) .^ 2)
grad_f(M, p) = sum(1 / n * grad_distance.(Ref(M), data, Ref(p)));
m1 = gradient_descent(M, f, grad_f, data[1])
in Manopt.jl too the manifold gets passed only at the optimizer invocation
My fear is that this is too prone to error for the following reason.
you can think of
m1 = gradient_descent(M, f, grad_f, data[1])
as
m1 = gradient_descent((M, f, grad_f), data[1])
so internally we actually have an objective (that would store f
and grad_f
) and a problem that store the objective and the manifold (!)
Already the initial point (here data2[1]
?) is in Manopt.jl
part of the “setting” of the solver not the problem (but that might really be philosopical
but that is not so nice to call, so I would prefer a syntax more like
optf = OptimizationFunction(f, Optimization.AutoForwardDiff())
prob = OptimizationProblem(M, optf, data2[1])
opt = OptimizationManopt.GradientDescentOptimizer()
@time sol = Optimization.solve(prob, opt)
(yes I have not looked at the internals, sorry)
This would be actually what Manopt
does.
Otherwise I would fear that a user might to easily do
opt = OptimizationManopt.GradientDescentOptimizer(M2) % for a random other manifold
@time sol = Optimization.solve(prob, opt)
and I fear this might really be something that I would avoid to lead the user to do – not only because I do not want to explain this (again and again). and wonder why this tremendously fails, even if you take the same manifold with just another metric – you gradient might already be completely wrong.
That makes sense, I'll figure out the best way to make it work
One thing one could do, is to define a new constructor for the OptimizationProblem
whose first argument is a manifold. This is stored in the keywords and passed on to a solver? Would that make sense?
I checked both the objective and the problem. While both have quite quite a few fields, they all seem to be very very much tailored towards constraint problems and objectives (up to second order) solvers.
This should be ready now.
The syntax looks like this
optf = OptimizationFunction(f, Optimization.AutoForwardDiff())
prob = OptimizationProblem(M, optf, data2[1], manifold = M)
and if the optimizer's manifold doesn't match the problem's it throws an error
"Either manifold not specified in the problem `OptimizationProblem(f, x, p; manifold = SymmetricPositiveDefinite(5))` or it doesn't match the manifold specified in the optimizer `$(opt.M)`"
The syntax looks like this
optf = OptimizationFunction(f, Optimization.AutoForwardDiff()) prob = OptimizationProblem(M, optf, data2[1], manifold = M)
That is exactly what I had in mind
and if the optimizer's manifold doesn't match the problem's it throws an error
"Either manifold not specified in the problem `OptimizationProblem(f, x, p; manifold = SymmetricPositiveDefinite(5))` or it doesn't match the manifold specified in the optimizer `$(opt.M)`"
It is already wuite late here so I will only have time tomorrow afternoon to check the whole code – but what is the reason to store the manifold in the optimiser? Sure the constructors sometimes need the manifold so we could keep that (for them to fill reasonable defaults), but if the optimiser would not store the manifold (since it is now stored in the problem anyways), then why store-twice-and-error instead of store-once-be-happy?
One thing I do not yet fully understand and/or see is, whether each and every solver in Manopt needs a wrapping struct here. But maybe that is necessary.
That is for now all I have in comments.
I don't like that either. I was actually trying to figure out how to avoid that already!
I can check that as well, but not for the next week or two, since we are currently in the last few weeks of the semester and besides designing an exam somewhere in between are not all the requests by students rising.
For now I was just wondering about that, because it seems to store a few parameters/settings (like the retraction or the vector transport) but others not. But on the other hand solvers are usually also specified in Optimization by structs right? at least that is how I read the second argument of the line
sol = solve(prob, BFGS())
from the readme; sure the structs here are maybe really just the same as that (just the here often retraction and vector transport have to be stored then).
I think it is possible to avoid them. Will push once I get it working
Oh I was wrong, since you don't have any method structs defined in Manopt.jl we'd need to create them here for the API to match other solver wrappers.
Can you elaborate a bit what you mean with method structs?
The reason I cam to that question is, that I think all the wrapper structs here are very similar (just less variables stored) to the structs call state in Manopt.jl (https://github.com/JuliaManifolds/Manopt.jl/blob/1ea2ac69d1ec48f3029841ddcee4af8958d361af/src/solvers/gradient_descent.jl#L2 ).
So what this wrapper here currently does is: Inventing a new problem (https://manoptjl.org/stable/plans/problem/) objective (https://manoptjl.org/stable/plans/objective/) and a state like the one above, to unwrap all that when called, calling the (high-level) interface gradient_descent
which internally again produces a problem, an objective and a state.
When all that is finished the result is returned, but I feel this currently doubles all the action of problem, objective and state. Maybe that is necessary, I have not yet fully understood the Optimization API to that extend.
@kellertuer the Optimization.jl API looks like this in general
f(x, p) = ...
constraints(x , p) = ...
optf = OptimizationFunction(f, AutoEnzyme())
prob = OptimizationProblem(optf, x0)
sol = solve(prob, Optimizer())
hence it needs a struct to instantiate the Optimizer()
. Since Manopt doesn't have it's own types/structs for these we have to create them here.
I did see the AbstractManoptSolverState
and this could be used to create the list of all the methods something like
julia> subtypes(Manopt.AbstractManoptSolverState)
15-element Vector{Any}:
AbstractGradientSolverState
AbstractPrimalDualSolverState
AdaptiveRegularizationState
ConvexBundleMethodState
CyclicProximalPointState
DebugSolverState
DouglasRachfordState
LanczosState
Manopt.AbstractSubProblemSolverState
Manopt.ReturnSolverState
NelderMeadState
ParticleSwarmState
ProximalBundleMethodState
RecordSolverState
SubGradientMethodState
but since the call_manopt_optimizer
will still need to be manually defined for all it would probably be equal work haha
hence it needs a struct to instantiate the Optimizer() . Since Manopt doesn't have it's own types/structs for these we have to create them here.
GradientDescentState
is exactly such a type/struct. It stores all settings the solver (in this example gradient descent) needs.
The one thing that might be a bit different in Manopt.j is, that gradient_descent
is the function meant to be used by an end user and GradientDescentState
is more like an internal thing. Also all AbstractManoptState
subtypes by convention have M
as their first argument in the constructor and usually a few mandatory arguments as well (augmented lagrangiag for example has 2 mandatory set of parameters to specify sub problem and sub solver).
But at least the ones without sub solvers quite often have a lot of defaults, so then only the manifold is a parameter (the manifold is not stored in the state, just used to for example take a good default retraction).
That is why I am not sure it is worth “inventing” half-states here that store parts of those instead just to dispatch on the call_manopt_optimizer
. But ok, that function I also have not yet fully understood.
I'd be very happy to not have to reimplement all that for every solver but right now it's impossible based on my understanding, but maybe @mateuszbaran can also suggest if we can avoid this and rely on some more generic interface to solvers in Manopt?
I hope I made clear, that I am not 100% sure that works, since I am not yet 100% sure how Optimization.jl works In all details, it just seems, the new structs are a bit of “double structures” introduced. Yeah but maybe Mateusz sees that a bit better.
Manopt currently lacks a solver structure that fits here. State is supposed to be constructed when you have at least a bit of information about the problem, and the only other option is solver_name
functions. They are fairly uniform.
I had a very similar problem when working on hyperparameter optimization for Manopt: https://juliamanifolds.github.io/ManoptExamples.jl/stable/examples/HyperparameterOptimization/ . Lack of a problem-independent solver structure made that code more ugly than I'd like.
Anyway, I think something like this would be a bit nicer then specializing in great detail for each solver:
struct ConcreteManoptOptimizer{TF,TKW<:NamedTuple} <: AbstractManoptOptimizer
optimizer::TF
kwargs::TKW
end
ConcreteManoptOptimizer(f::TF) where {TF} = ConcreteManoptOptimizer{TF}(f, (;))
# for example
# solver = ConcreteManoptOptimizer(quasi_Newton)
function call_manopt_optimizer(
M::ManifoldsBase.AbstractManifold, opt::ConcreteManoptOptimizer,,
obj::AbstractManifoldObjective,
x0;
stopping_criterion::Union{Manopt.StoppingCriterion, Manopt.StoppingCriterionSet},
kwargs...)
opts = opt.optimizer(M,
loss,
obj,
x0;
return_state = true,
stopping_criterion,
kwargs...)
# we unwrap DebugOptions here
minimizer = Manopt.get_solver_result(opts)
return (; minimizer = minimizer, minimum = loss(M, minimizer), options = opts)
end
And here:
opt_res = call_manopt_optimizer(manifold, cache.opt, _loss, gradF, cache.u0;
solver_kwarg..., stopping_criterion = stopping_criterion)
You'd have to construct a manifold objective:
obj = ManifoldGradientObjective(_loss, gradF; evaluation=evaluation)
opt_res = call_manopt_optimizer(manifold, cache.opt, obj, cache.u0;
solver_kwarg..., stopping_criterion = stopping_criterion)
or
ManifoldCostObjective(f)
for gradient-free optimization, or
ManifoldHessianObjective(f, grad_f, Hess_f, preconditioner; evaluation=evaluation)
for optimization with Hessian-vector products.
If this doesn't work for some solver, we could either fix it in Manopt or specialize call_manopt_optimizer
for that solver.
Note that some solvers don't strictly require x0
but that is not particularly important at this stage.
Maybe ConcreteManoptOptimizer
could even exist in Manopt since it would also help make the hyperparameter optimization example nicer.
Another note, Armijo and WolfePowell are not great line search algorithms. https://github.com/mateuszbaran/ImprovedHagerZhangLinesearch.jl/ is currently the best one available for Manopt. It might get integrated into LineSearches.jl at some point but currently it doesn't have a particularly high priority (see https://github.com/JuliaNLSolvers/LineSearches.jl/pull/174 ).
I do agree that the `solver_name´ (high-level) functions were always meant to be the ones used by end users, but could you elaborate on
Manopt currently lacks a solver structure that fits here. State is supposed to be constructed when you have at least a bit of information about the problem
? I think the main thing the solver states do need is a manifold. Only the ones with sub solvers sometimes have some nice fallbacks where the sub problem is automatically created from the main objective as a default – but otherwise the user has to specify a full subsolver anyways.
For example https://manoptjl.org/v0.4/solvers/gradient_descent/#Manopt.GradientDescentState – sure needs a manifold like nearly anything in Manopt, but that is also just needed to initialise the internal fields.
Concerning the need of a x0 (or whether a random is chocse) – we should unify that to always fall back to a rand.
I like your approach and if some states do not yet fit that we could check how we can adapt them :)
I do agree that the `solver_name´ (high-level) functions were always meant to be the ones used by end users, but could you elaborate on
Manopt currently lacks a solver structure that fits here. State is supposed to be constructed when you have at least a bit of information about the problem
? I think the main thing the solver states do need is a manifold. Only the ones with sub solvers sometimes have some nice fallbacks where the sub problem is automatically created from the main objective as a default – but otherwise the user has to specify a full subsolver anyways.
States sometimes also need a point for initialization, and requiring a manifold simply rules them out in hyperparameter optimization, and in your own words:
The newest test already looks quite nice, just that philosophically I would prefer to have the manifold in the objective not the solver.
For example https://manoptjl.org/v0.4/solvers/gradient_descent/#Manopt.GradientDescentState – sure needs a manifold like nearly anything in Manopt, but that is also just needed to initialise the internal fields.
Concerning the need of a x0 (or whether a random is chocse) – we should unify that to always fall back to a rand.
I like your approach and if some states do not yet fit that we could check how we can adapt them :)
This is IMO not a great solution, these states don't look like they were intended to be used like that. And ugly things start happening when the default point doesn't match what user wants to provide through the Optimization.jl interface. And we have two places where manifold is needed, not great.
Well for the point, we should then move to using rand(M)
for them.
But we will not be able to get rid of the manifold, since we definetly need it for said rand then – and to initiallze (usually zero-) tangent vectors. I do not see any way around that.
Sure the manifold is needed in two places – but I do not see how we can avoid that (in like: at all, that is structurally necessary to initialise some fields and types).
Ideally we should have manifold in only one place, either objective or solver. You suggested you prefer manifold to be in the objective. So we can't use Manopt states to represent solvers here. And IMO states shouldn't really be constructed until we have initial search point. Which gives another reason why states are not suitable here.
The *State should map to OptimizationState https://docs.sciml.ai/Optimization/stable/API/optimization_state/. The solver structs are for telling the solve
this is the optimizer I want to use - the *State are more for storing how the optimization is progressing and since it has "State" in its name so it can't directly be used here
Hm. I do not see how states (as they are designed in Manopt.jl) should ever be able to be generated without a manifold; something like storing the iterate requires at some point to initialise that field (and know its type, since all states are parametrised with that). So we can not do magic.
Ah, now I see, I thought until now “solver struct” is the Optimization.jl alias for state, though I saw the state thingy. Then the state should definelty not be used in the “solver struct” setting – even more since it is not suitable. The question is now should we do one “solver struct” for every Manopt Solver? Or is a generic one enough (and more flexible)?