DFTK.jl icon indicating copy to clipboard operation
DFTK.jl copied to clipboard

Document that it is not save to reuse SCF callbacks

Open umbriquse opened this issue 4 years ago • 20 comments
trafficstars

I wrote an optimizer to predict bonding energies, but I am currently running into issues with the Eigensolver not converging in the self_consistent_field function. I'm curious what could be causing this issue.

umbriquse avatar Jul 26 '21 15:07 umbriquse

Did you request super small tolerances? Can you post a MWE?

antoine-levitt avatar Jul 26 '21 15:07 antoine-levitt

No, the tol is at it's default value. I might be able to, but the program is quite large and fractured. Would that mean that the issue would most likely lie in the setup of the scf? Or could it be in the PlaneWaveBasis?

umbriquse avatar Jul 26 '21 15:07 umbriquse

I used to have this issue when I was recycling the electron density information for the next calculation. The fix for that was to stop recycling the electron density.

umbriquse avatar Jul 26 '21 15:07 umbriquse

The first couple solved scf's do not experience this issue. Only after a few solved scf iterations does this start to happen.

umbriquse avatar Jul 26 '21 16:07 umbriquse

Could it be that you are running a very large calculation? How many electrons and bands do you have?

It could be that our diagtol heuristics is simply very bad for large problems and requests a tolerance that cannot be delivered by the LOBPCG in 200 iterations. We probably need to adapt the heuristics for such cases ... I remember looking at that in QE and finding out that they do some more tricks. We should probably test our heuristics some more on these cases.

mfherbst avatar Jul 26 '21 16:07 mfherbst

Number of electrons: 34 Number of bands = 41

umbriquse avatar Jul 26 '21 16:07 umbriquse

issue.txt

The above file is a MWE. The file has 1281 lines, so if you have any questions just ask. The functions in the file are a part of a package I am developing called Dino.

You should be able to copy and paste this code into a julia file and run it.

It takes a few iterations in either the lattice optimization, or the atom optimization for the Eigensolver to not converge.

NOTE: This file does create files for saved data.

umbriquse avatar Jul 26 '21 21:07 umbriquse

I found was was happening. The tolerance on diagonalize_all_kblocks kept shrinking after each completed scf in the optimizer. Any recommendations for a minimum tolerance for ScfDiagtol()? I believe the issue lies in the line info.n_iter == 2 && (diagtol_max /= 5) # Enforce more accurate Bloch wave line in scf_callbacks.jl. The tolerance will continue to drop by a factor of 5 even past the 100eps(real(eltype(info.ρin))) value.

EDITS: Added more necessary information

umbriquse avatar Jul 27 '21 22:07 umbriquse

After some more tests I found that the clamp function isn't working as expected

umbriquse avatar Jul 27 '21 22:07 umbriquse

I'm not sure how, but after a hacky fix the first iteration could still drop below the allowed minimum value. This suggests that some information is being passed from one self_consistent_field call to the next.

umbriquse avatar Jul 27 '21 22:07 umbriquse

It looks like you are calling ScfDiagtol, saving it and then reusing it for several calls to self_consistent_field(). That's indeed going to do bad things, and not expected at all by our code (and I'm not sure we could solve that elegantly). Just use a new ScfDiagtol struct for each call to self_consistent_field. Looking at your code, I see that you specify explicitly the kwargs to self_consistent_field, why are you doing this?

antoine-levitt avatar Jul 28 '21 06:07 antoine-levitt

To be clear: ScfDiagtol is a closure, ie it stores some information implicitly by capturing its kwarg in the inner function

antoine-levitt avatar Jul 28 '21 06:07 antoine-levitt

Actually not only ScfDiagtol is a closure, also some of the other callbacks are. It is generally not save to reuse them. Rather you should reuse the arguments to construct them.

One way we could avoid this problem is by converting them to proper structs and implementing a reset mechanism, but I'm not too sure if that not overcomplicates things.

edit: In your example this concerns in particular the callback, determine_diagtol and is_converged fields of the arguments datastructure.

mfherbst avatar Jul 28 '21 07:07 mfherbst

I think we should add a note to the documentation that makes this statefulness of the callbacks more clear.

mfherbst avatar Jul 28 '21 07:07 mfherbst

The reason as to why I specify the kwargs explicitly is because the end user of this package wouldn't have direct access to the self_consistent_field function. I didn't know about closure functions, but they make sense. Sorry about the confusion. I've since made the necessary changes and it works properly now.

umbriquse avatar Jul 29 '21 16:07 umbriquse

I don't know what you want to do but you can do eg f(scf_args) = (...; self_consistent_field(basis; scf_args...)); f((;tol=1e-4))

antoine-levitt avatar Jul 29 '21 16:07 antoine-levitt

Also the merge function for named tuples is super useful for this kind of things

antoine-levitt avatar Jul 29 '21 16:07 antoine-levitt

But that causes trouble for the diagtol mechanism for example. There you need to go to the next level of indirection (i.e. cache the arguments passed to diagtol rather than the diagtol kwarg). I've fallen into that trap as well.

mfherbst avatar Jul 29 '21 17:07 mfherbst

Yeah, @mfherbst . I've gotten to that point as well.

umbriquse avatar Jul 29 '21 17:07 umbriquse

Yeah two people falling into that trap sounds like we should do something about it, so good you raised the issue @umbriquse .

mfherbst avatar Jul 29 '21 17:07 mfherbst