DFTK.jl [Discussion] `direct_minimization` with Manopt.jl

trafficstars

Following the discussion in #1102, here is a work in progress on moving direct_minimization to an extension and to use Manopt.jl.

Due to the discussions in the linked issue I feel a bit unsure how and whether to continue. Currently left to do

[x] 1 document direct_minimization in src/scf/ that it requires Manopt and Manifolds to work
[x] 2 (maybe) rename ManoptPreconditioner to maybe something more reasonable. It basically wraps a DFTK preconditioned
[x] 3 CostGradFunctor! should get a nicer name and could maybe also become the fixed energy/gradient computation but also maybe stay a keyword argument
[x] 4 Document CostGradFunctor!
~~[ ] 5 clarify what the use of alphaguess is and what else than `nothing it can/should be.~~(removed it)
[x] 6 the preconditioner has to be passed differently to gradient_descent/cg than Quasi newton, this needs to be handled somewhat.
[ ] 7 discuss what to do in the end with the info or in other words what to return from that function.
[ ] 8 add tests and see where we can cover this in the docs

Points 1-4 and 8 are merely to work though as soon as I find time On 5 and 7 I need some help and feedback 7 is maybe a bit tricky to get right in general.

For now this function (and its helpers) are a bit work in progress – I open this a bit early, since I am not sure the philosophies of the packages agree well enough that this approach is useful. in the issue above it sounds currently like the idea here is that the philosophy is too different and that it might not fit.

Jun 03 '25 14:06 kellertuer

I had a meeting that was shorter than I thought, so here is a small snippet that somewhat works on this branch already (thanks to @elorannug for the setup)

using LinearAlgebra, DFTK, Manopt, Manifolds, RecursiveArrayTools, Random
using LineSearches
using PseudoPotentialData, AtomsBuilder, Unitful, UnitfulAtomic

#
#
# Load data and setup model
Random.seed!(1)
Ecut = 5 # Energy cut-off.   Realistic value: Ecut = 20
kg = 1   # Number of k-points Realistic value:   kg = 5
tol = 1e-1 # Tolerance for the minimization, default is 1e-6
# Set up the model using planewave basis
pseudopotentials = PseudoFamily("dojo.nc.sr.pbesol.v0_4_1.standard.upf");
model = model_DFT(bulk(:Si); functionals=PBEsol(), pseudopotentials);
basis = PlaneWaveBasis(model; Ecut=Ecut, kgrid=[kg, kg, kg]);
model_to_save_psi = basis.model
filled_occ_s = DFTK.filled_occupation(model_to_save_psi)
n_spin_s = model_to_save_psi.n_spin_components
n_bands_s = div(model_to_save_psi.n_electrons, n_spin_s * filled_occ_s, RoundUp)
random_ψ = [DFTK.random_orbitals(basis, kpt, n_bands_s) for kpt in basis.kpoints]

direct_minimization(basis;
    ψ=random_ψ,
    tol=tol,
    maxiter=1000,
    #TODO: Check that it also works with: other manifolds, retractions, vector transports, debug & record:
    debug = [DebugCost(), DebugGradientNorm(), "\n", 10, :Stop],
    # record = [:Cost, :GradientNorm],
    # manifold = Grassmann,
    # stepsize = Manopt.LineSearchesStepsize(LineSearches.BackTracking()),
)

It still requires a bit of careful default parameter tuning, since the default (Armijo) line search currently uses the QR retraction instead of the default we prescribe and such. But a very first small generic interface does work.

Jun 04 '25 08:06 kellertuer

While I did not yet address the points above, I spent this evening debugging the code, now it is relatively easy to

change the solver
change the manifold
change the linesearch
change the retraction
change the vector transport
activate debug & record
change the stopping criterion (though that “deactivates” the values passed to tol and maxiter

Every now and then it seems the gradient is not so correct because the step size collapses after a few iterations, I have to check that (or the example is too simple so it finishes after < 4 steps anyways).

6 is nearly done, just requires a bit of tolerance on the gradient descent side in manopt; Feedback on 5 and 7 would be nice; besides a bit of debugging the rest, only documentation and testing (and maybe a bit of renaming) is missing.

Jun 06 '25 21:06 kellertuer

I worked a bit mainly on a bit of testing (as far as my knowledge reaches), naming and documentation.

This should work for now, with the small caveat that for GradientDescentState (i.e. using gradient descent) as well as to use callbacks, we have to wait for the next version of Manopt.jl to be released. Besides waiting for these, the PR is finished up to the following points to discuss

Questions on the code

(also written within the code in the corresponding places)

[x] In the stopping criterion – if I just know the manifold, can I then already compute (at least the size/type) of the density ρ? That would make a much nicer constructor only requiring the tolerance. (This is currently not possible since from M we can not infer what ρ was)
[x] Does it makes sense to be able to record ρ ? A user could then easily use `record = [:ρ] to record if we do an easy recorder (but maybe also no one likes that here). Provided, maybe it is useful for someone
[x] a debug is maybe not so useful, since it seems to be a large array, but its norm maybe? (Implemented)
[x] Should a user be able to provide their own cost/grad? (see ...and beyond below as well) Then we would have to change a few small things in the setup. (-> Implemented)
[x] What should the debug_info contain? (There is a list but it is not yet realised)

Questions in general

[x] does DFTK have a certain code style I should apply? (not very strict it seems)
[x] does DFTK want to use DocumenterCitations.jl for literature? I could set it up, either here or in a separate PR (Setup, but needs some fine tuning)
[x] does DFTK want to use DocumenterInterlinks.jl to link to other-peoples-package-docs? (Setup, but not yet used)

...and beyond

I had a discussion with @jonas-pueschel to also include his CG approache; that however would first mean, that in severa places, especially in Armijo and Wolfe, the inner product with the gradient has to ne replaced an evaluation of the differentia, see https://github.com/JuliaManifolds/Manopt.jl/issues/482. Then we can probably either update the cost/grad here or provide a second alternative.

Jun 17 '25 14:06 kellertuer

Yes a zoom is probably the best, in Text it might take a few iterations too much.

edit: one that could easily be resolved I already resolved – and added a few notes to the comments, maybe also just for me to remember to discuss this :)

Jun 19 '25 11:06 kellertuer

About the return values, I think this would be a very good initial set:

ham
basis
energies (the first result of energy_hamiltonian)
converged (true or false)
ρ,
occupation
n_iter (iteration count)
n_matvec (number of hamiltonian application ... if it's easy to obtain, else don't bother)
ψ
history_Δρ, history_Etot (changes of ρ in each step, changes of cost function (total energy) in each step)
algorithm="StiefelDM" (some short description o the algorithm that has been employed
runtime_ns (The total run time in nanoseconds if you can easily record that)

Jul 01 '25 09:07 mfherbst

All of those are reasonably easy to get, I think.

Jul 01 '25 09:07 kellertuer

Update from the last commits: I still have to check a few things for the named tuple, like the number of iterations and the spent time, and afterwards I have to write the example and check that the other examples still run fine. Currently some fail, but that might be due to the named tuple being incomplete.

I also want to check the InterLinks and new references a bit and check that the new doc strings show up in the docs (but interestingly documenter does not complain for now)

Jul 04 '25 15:07 kellertuer

@mfherbst I think I have finished everything we discussed and I was able to do. Here is a few remaining things I am unsure about

[x] documentation: I am currently not sure what is documented where, since I have not yet understood the structure of the docs. running them locally does at least not throw errors that docs are missing, but it currently also does not seem to load/document extensions.
[x] the wannier example from the ecosystem errors when running the docs, it seems that is unrelated to what I did.
[x] every now and then in the examples/gross_pitaevskii.jl-L89 fails and I have not yet found out why I checked this example more closely and the line fails with
```
 julia> E.total
 147.37290591541824

 julia> scfres.energies.total
 147.37290591541844
```
it is just a nuance, and I think this is somewhere a rounding error since it just stems from evaluating the energy an the same point. edit: it is probably just rounding errors.
[x] I can run the tests locally and they run really long, but it seems all of them pass. Indeed after 4.5 hours, only a few tests also concerning the wannier90 fail, so that is unrelated to this PR.
[x] currently direct_minimization can not yet report result.converged see the explanation here

I think this is how far I can provide code here. For a next step I would probably need help. Feedback on the code is of course also welcome.

Jul 07 '25 19:07 kellertuer

This is a careful bump after 2 months without any reaction on my last post with questions and/or a review.

If this is too much work or too much of a change to ask, we can also not do this.

Sep 09 '25 08:09 kellertuer

Thanks for the reminder and sorry for the delay. I'll try to follow up in the next days.

Sep 09 '25 10:09 mfherbst

every now and then in the examples/gross_pitaevskii.jl-L89 fails

If it's just a small energetic glitch as you mention it, I would not be too worried. But if fails means it crashes, then I am.

while most old examples still run, we could consider writing a separate example for this new solver, maybe using Grassmann?

We can, but not extremely important from my point of view. If you have it handy, it does not hurt, though.

the internal InsulatorEnergy currently stores both occupation and filled_occ, since as I wrote, I am not able to change that with the knowledge I have.

If it's clear now feel free to do it. If not, I'll do a cleanup pass and then I can take a look.

documentation: I am currently not sure what is documented where, since I have not yet understood the structure of the docs. running them locally does at least not throw errors that docs are missing, but it currently also does not seem to load/document extensions.

I don't know about the extensions. Basically what we do is that we re-use the examples also as documentation. Is that your confusion ? It may also be that the literate step causes trouble ?

Sep 13 '25 18:09 mfherbst

To answer a few already:

every now and then in the examples/gross_pitaevskii.jl-L89 fails

If it's just a small energetic glitch as you mention it, I would not be too worried. But if fails means it crashes, then I am.

to me it looks like small glitches.

while most old examples still run, we could consider writing a separate example for this new solver, maybe using Grassmann?

We can, but not extremely important from my point of view. If you have it handy, it does not hurt, though.

Mainly a matter of time, and I fear it would take me quite some time.

the internal InsulatorEnergy currently stores both occupation and filled_occ, since as I wrote, I am not able to change that with the knowledge I have.

If it's clear now feel free to do it. If not, I'll do a cleanup pass and then I can take a look.

to me this is to 0% clear.

documentation: I am currently not sure what is documented where, since I have not yet understood the structure of the docs. running them locally does at least not throw errors that docs are missing, but it currently also does not seem to load/document extensions.

I don't know about the extensions. Basically what we do is that we re-use the examples also as documentation. Is that your confusion ? It may also be that the literate step causes trouble ?

Well. To be precise: The new doc strings are defined in the extension. They are not included in the documentation at all. But the documentation also does not error, like it usually does when doc strings are not included in the docs. The simple plain reason is, that the documentation is not aware the extension exists. See https://github.com/JuliaManifolds/Manopt.jl/blob/e4a5872c097ec839b799c5d4cacd314d9a5837f6/docs/make.jl#L162-L169 how you have to add them (and of course also have to have loaded them beforehand). I think this is unrelated to literate.

Sep 14 '25 18:09 kellertuer

I think it's converging. Take a look what you can fix with the additional information from my end, else I'd suggest we have a call to clean things up and then I can take over the rest whatever is needed. Things we should clarify:

How does the printing look like

If you could make more precise what you mean here?

Anything else to be discussed on docs building ?

As written in the post above, currently 0 documentation of extensions seems to be in the documentation.

[x] Source code formatting: We try to keep line length around 92 (see the scarce style guide https://docs.dftk.org/stable/developer/style_guide/)

Been there, done that.

Sep 15 '25 07:09 kellertuer

Since we released has_converged yesterday, I readied the converged= field to the named tuple as well.

Sep 15 '25 08:09 kellertuer

I tried at least illustrating the documentation of things in the extension

load all packages to trigger loading of the extension
adding the extension to the modules in makedocs
collect all those doc strings on a separate page

When trying to do 3 I usually check which signatures fail to add them on that page – otherwise it is a bit hard to specify that by hand.

However, I can not render the docs locally, the Error I get is (starting snippet)

┌ Error: failed to run `@example` block in src/ecosystem/wannier.md:177-198
│ ```@example wannier
│ using wannier90_jll  # Needed to make run_wannier90 available
│ run_wannier90(scfres;
        [... skipped rest of code block that is run ... ]
│              );
│ nothing #hide
│ ```
│   exception =
│    UndefVarError: `wannier90` not defined in `wannier90_jll`
│    Suggestion: check for spelling errors or missing imports.
│    Stacktrace:
│      [1] (::DFTKWannier90Ext.var"#2#3"{String, String, String})()

I would assume that is unlrelated to this PR? However, the last commit is hence only a first step towards documenting the extension.

I checked the other extensions, and only one has a one-line (more-like-dummy) doc string. So maybe for other extensions this is not necessary. For the one here, I am unsure how to continue – also not for the other open points.

Let me know what best to do then.

Sep 17 '25 16:09 kellertuer

to me it looks like small glitches.

ok, good then.

Mainly a matter of time, and I fear it would take me quite some time.

then let's not

to me this is to 0% clear.

Ok then I do it. Is it ok if I push to your branch directly ?

If you could make more precise what you mean here?

How do the output look like when running the direct minimisation with ManOpt.

As written in the post above, currently 0 documentation of extensions seems to be in the documentation.

Ok, but then all it takes is to add the Base.get_extension calls, right ?

I would assume that is unlrelated to this PR?

I would think so. Are you on a windows machine ? Because Wannier90 does not compile on windows (hence no binary in Yggdrasil).

Sep 18 '25 10:09 mfherbst

Mainly a matter of time, and I fear it would take me quite some time.

then let's not

Thanks for your understanding.

to me this is to 0% clear.

Ok then I do it. Is it ok if I push to your branch directly ?

Sure, feel free to change on this branch what is necessary or what ever you like :)

If you could make more precise what you mean here?

How do the output look like when running the direct minimisation with ManOpt.

Its Manopt (no capital o) – but sure I can check that again and post that here, though maybe only tomorrow

As written in the post above, currently 0 documentation of extensions seems to be in the documentation.

Ok, but then all it takes is to add the Base.get_extension calls, right ?

And loading the packages before, the ? get_extension` only returns extensions that are loaded already. I already extended the docs environment (to inlucde Manopt, Manifolds and RecursiveArrayTools) and then the extension is loaded and added. Just that I could not yet render the docs then.

I would assume that is unlrelated to this PR?

I would think so. Are you on a windows machine ? Because Wannier90 does not compile on windows (hence no binary in Yggdrasil).

No I am on a Mac (macOS Tahoe 26.0, just updated yesterday)

Sep 18 '25 11:09 kellertuer

It's been a while again. Anything I can help with?

I think the one thing missing is still the occupation vs filled_occ that I still can not fix because I do still not yet understand how one can avoid the duplication.

Besides that

[ ] How does the printing look like

If you could make more precise what you mean here?

This was not yet answered. I am happy to provide show methods, but I can not just guess what is the usual thing one want the printing to look like?

[ ] Anything else to be discussed on docs building ?

As written in the post above, currently 0 documentation of extensions seems to be in the documentation.

Same here: Currently 0% of extension documentation is rendered, since extensions are not included in the modules list of the documentation. If that is how it should be, then all things for the docs have been discussed.

and finally (and fourth) the issue that now 3 packages have to be loaded. I do not have a good solution for that. If that is too annoying, then we should maybe discontinue this and stick to the old scheme of using optim, that is also fine with me, then just close this PR.

Oct 08 '25 08:10 kellertuer

I think the one thing missing is still the occupation vs filled_occ that I still can not fix because I do still not yet understand how one can avoid the duplication.

As I said, I'll look into it, but my coding time is limited, unfortunately. But it's still on my list.

If you could make more precise what you mean here?

I meant quite naively, what does the printed output look like if I run the code. But I can check when I look at the first point .

Same here: Currently 0% of extension documentation is rendered, since extensions are not included in the modules list of the documentation. If that is how it should be, then all things for the docs have been discussed.

I thought you suggested this can be changed, which sounded very reasonable to me. You also mentioned cross-referencing documentation. These are things which would be nice to have and I just wondered if we are all set on this or if more things need to be discussed.

and finally (and fourth) the issue that now 3 packages have to be loaded. I do not have a good solution for that. If that is too annoying, then we should maybe discontinue this and stick to the old scheme of using optim, that is also fine with me, then just close this PR.

Yeah, I think that will be too annoying. But I think we can work around that by making a small DFTKManopt package, which then hard-depends on all there dependencies. If this exposes (@rexports maybe) Manopt, Manifolds and the third package then you use that in your extension. That would mean only one extra package needs to be loaded by the user.

Oct 08 '25 14:10 mfherbst

Same here: Currently 0% of extension documentation is rendered, since extensions are not included in the modules list of the documentation. If that is how it should be, then all things for the docs have been discussed.

I thought you suggested this can be changed, which sounded very reasonable to me. You also mentioned cross-referencing documentation. These are things which would be nice to have and I just wondered if we are all set on this or if more things need to be discussed.

Yes, but to me it was unclear whether that is wanted - and currently I can not do that since I can not get the docs to render locally, see the problem with wannier. And sure cross-refs could be introduced then as well, but I would prefer to also do that not “blind” (without being able to render the docs myself). So probably someone else has to take that over.

and finally (and fourth) the issue that now 3 packages have to be loaded. I do not have a good solution for that. If that is too annoying, then we should maybe discontinue this and stick to the old scheme of using optim, that is also fine with me, then just close this PR.

Yeah, I think that will be too annoying. But I think we can work around that by making a small DFTKManopt package, which then hard-depends on all there dependencies. If this exposes (@rexports maybe) Manopt, Manifolds and the third package then you use that in your extension. That would mean only one extra package needs to be loaded by the user.

RecursiveArrayTools is the third one. I have no experience with @reexport, but I will leave that then also do you, so I am unsure how to do this.

But ok, then I know now, that I currently can not help any further, but I am of course available if there are things that need to be clarified further, so I will keep myself subscribed here, but put it aside (no more reminders) now then.

Oct 08 '25 16:10 kellertuer

But ok, then I know now, that I currently can not help any further, but I am of course available if there are things that need to be clarified further, so I will keep myself subscribed here, but put it aside (no more reminders) now then.

Sounds good. I'll @mention you when I'm done or need help.

Oct 08 '25 18:10 mfherbst

DFTK.jl DFTK.jl copied to clipboard

[Discussion] `direct_minimization` with Manopt.jl

Questions on the code

Questions in general

...and beyond

DFTK.jl
DFTK.jl copied to clipboard