DFTK.jl
DFTK.jl copied to clipboard
Document that it is not save to reuse SCF callbacks
I wrote an optimizer to predict bonding energies, but I am currently running into issues with the Eigensolver not converging in the self_consistent_field function. I'm curious what could be causing this issue.
Did you request super small tolerances? Can you post a MWE?
No, the tol is at it's default value. I might be able to, but the program is quite large and fractured. Would that mean that the issue would most likely lie in the setup of the scf? Or could it be in the PlaneWaveBasis?
I used to have this issue when I was recycling the electron density information for the next calculation. The fix for that was to stop recycling the electron density.
The first couple solved scf's do not experience this issue. Only after a few solved scf iterations does this start to happen.
Could it be that you are running a very large calculation? How many electrons and bands do you have?
It could be that our diagtol heuristics is simply very bad for large problems and requests a tolerance that cannot be delivered by the LOBPCG in 200 iterations. We probably need to adapt the heuristics for such cases ... I remember looking at that in QE and finding out that they do some more tricks. We should probably test our heuristics some more on these cases.
Number of electrons: 34 Number of bands = 41
The above file is a MWE. The file has 1281 lines, so if you have any questions just ask. The functions in the file are a part of a package I am developing called Dino.
You should be able to copy and paste this code into a julia file and run it.
It takes a few iterations in either the lattice optimization, or the atom optimization for the Eigensolver to not converge.
NOTE: This file does create files for saved data.
I found was was happening. The tolerance on diagonalize_all_kblocks kept shrinking after each completed scf in the optimizer. Any recommendations for a minimum tolerance for ScfDiagtol()?
I believe the issue lies in the line
info.n_iter == 2 && (diagtol_max /= 5) # Enforce more accurate Bloch wave line in scf_callbacks.jl.
The tolerance will continue to drop by a factor of 5 even past the 100eps(real(eltype(info.ρin))) value.
EDITS: Added more necessary information
After some more tests I found that the clamp function isn't working as expected
I'm not sure how, but after a hacky fix the first iteration could still drop below the allowed minimum value. This suggests that some information is being passed from one self_consistent_field call to the next.
It looks like you are calling ScfDiagtol, saving it and then reusing it for several calls to self_consistent_field(). That's indeed going to do bad things, and not expected at all by our code (and I'm not sure we could solve that elegantly). Just use a new ScfDiagtol struct for each call to self_consistent_field. Looking at your code, I see that you specify explicitly the kwargs to self_consistent_field, why are you doing this?
To be clear: ScfDiagtol is a closure, ie it stores some information implicitly by capturing its kwarg in the inner function
Actually not only ScfDiagtol is a closure, also some of the other callbacks are. It is generally not save to reuse them. Rather you should reuse the arguments to construct them.
One way we could avoid this problem is by converting them to proper structs and implementing a reset mechanism, but I'm not too sure if that not overcomplicates things.
edit: In your example this concerns in particular the callback, determine_diagtol and is_converged fields of the arguments datastructure.
I think we should add a note to the documentation that makes this statefulness of the callbacks more clear.
The reason as to why I specify the kwargs explicitly is because the end user of this package wouldn't have direct access to the self_consistent_field function. I didn't know about closure functions, but they make sense. Sorry about the confusion. I've since made the necessary changes and it works properly now.
I don't know what you want to do but you can do eg f(scf_args) = (...; self_consistent_field(basis; scf_args...)); f((;tol=1e-4))
Also the merge function for named tuples is super useful for this kind of things
But that causes trouble for the diagtol mechanism for example. There you need to go to the next level of indirection (i.e. cache the arguments passed to diagtol rather than the diagtol kwarg). I've fallen into that trap as well.
Yeah, @mfherbst . I've gotten to that point as well.
Yeah two people falling into that trap sounds like we should do something about it, so good you raised the issue @umbriquse .