DFTK.jl icon indicating copy to clipboard operation
DFTK.jl copied to clipboard

Test on forces derivatives is failing unexpectedly

Open gkemlin opened this issue 3 years ago • 48 comments
trafficstars

In the recent CI from the previous PR's (see eg here), there was some bug happening with the derivative of forces for metal which I wanted to investigate, but this was merged automatically. In the test test/forwardiff.jl, we use rHF silicon with temperature for this test and it happens to fail sometimes, always for this specific case and not any other test. At first I thought this was an error from χ0, but actually the error seems to happen before. For instance, I've tweaked a bit the configuration from test/forwardiff.jl in the code below, and this results in a error from symmetrize_ρ https://github.com/JuliaMolSim/DFTK.jl/blob/17c64890a38b17fc8b57abe5306715aee56e9127/src/response/chi0.jl#L345 where the assertion from https://github.com/JuliaMolSim/DFTK.jl/blob/17c64890a38b17fc8b57abe5306715aee56e9127/src/fft.jl#L69 is not passed.

Any ideas what is happening here ? This really only happens for the rHF model, so maybe there is something we miss here but I can't manage to understand what...

using DFTK
using ForwardDiff
using Random

a = 10.26  # Silicon lattice constant in Bohr
lattice = a / 2 * [[0 1 1.];
                   [1 0 1.];
                   [1 1 0.]]
Si = ElementPsp(:Si, psp=load_psp("hgh/lda/Si-q4"))
atoms     = [Si, Si]
positions = [[1.02, 0.97, 1.04] / 8, -ones(3) / 8]
temperature = 1e-3

function compute_force(ε1, ε2)
    T = promote_type(typeof(ε1), typeof(ε2))
    # solve rHF for Silicon (metallic)
    pos = positions .+ [ zeros(3), ε1 * [1., 0, 0] + ε2 * [0, 1., 0]]
    model = model_DFT(Matrix{T}(lattice), atoms, pos, []; temperature)
    basis = PlaneWaveBasis(model; Ecut=5, kgrid=[2, 2, 2], kshift=[0, 0, 0])
    is_converged = DFTK.ScfConvergenceDensity(1e-10)
    scfres = self_consistent_field(basis; is_converged,
                                   response=ResponseOptions(verbose=true))
    # return forces
    compute_forces_cart(scfres)
end

derivative_ε1_fd = let ε1 = 1e-5
    (compute_force(ε1, 0.0) - compute_force(0.0, 0.0)) / ε1
end
derivative_ε1 = ForwardDiff.derivative(ε1 -> compute_force(ε1, 0.0), 0.0)
@show norm(derivative_ε1 - derivative_ε1_fd)

gkemlin avatar Oct 03 '22 10:10 gkemlin

I hate symmetriiiiiiiiiiiiiiiiies

antoine-levitt avatar Oct 03 '22 11:10 antoine-levitt

This really only happens for the rHF model

You're sure this is not because of this being a metal?

Edit: This is really because of the way we setup up this particular structure and problem, see below.

mfherbst avatar Oct 04 '22 06:10 mfherbst

I found that a different number of irreducible k-points is used between some of the calculations (8 in the displaced finite-diff, 6 in the displaced and 6 in the forward-diff run). That in combination with the small Ecut and kgrid used, might explain it ... especially given that we model a metal here.

Edit: Ok that's not the full story. Even with a larger kgrid and Ecut it gives differences. Increasing temperature or changing the blowup function also does not help.

mfherbst avatar Oct 04 '22 07:10 mfherbst

Yes, indeed. Even symmetries=false in the model gives different results.

You're sure this is not because of this being a metal?

I've tried with magnesium and I don't manage to reproduce this.

gkemlin avatar Oct 04 '22 12:10 gkemlin

Indeed, the symmetrization of the forces appears to be wrong here, the forces at (0,0) are different with and without symmetries. MWE:

using DFTK
using ForwardDiff
using Random

a = 10.26  # Silicon lattice constant in Bohr
lattice = a / 2 * [[0 1 1.];
                   [1 0 1.];
                   [1 1 0.]]
Si = ElementPsp(:Si, psp=load_psp("hgh/lda/Si-q4"))
atoms     = [Si, Si]
positions = [[1.02, 0.97, 1.04] / 8, -ones(3) / 8]
temperature = 1e-3


for sym in (true, false)
    ε1 = ε2 = 0.0
    # solve rHF for Silicon (metallic)
    pos = positions .+ [ zeros(3), ε1 * [1., 0, 0] + ε2 * [0, 1., 0]]
    model = model_DFT(lattice, atoms, pos, []; temperature, symmetries=sym)
    basis = PlaneWaveBasis(model; Ecut=5, kgrid=[2, 2, 2], kshift=[0, 0, 0])
    is_converged = DFTK.ScfConvergenceDensity(1e-10)
    scfres = self_consistent_field(basis; is_converged,
                                   response=ResponseOptions(verbose=true))
    scfres.energies
    # return forces
    display(compute_forces_cart(scfres))
end

I'll try to take alook...

antoine-levitt avatar Oct 05 '22 07:10 antoine-levitt

This appears to be a case of accidental symmetries: changing the numbers in the positions somewhat results in less symmetries. Did I mention I hated symmetries?

antoine-levitt avatar Oct 05 '22 07:10 antoine-levitt

With forces_nonsym the forces obtained from a symmetries=false computation:

julia> DFTK.symmetrize_forces(model, forces_nonsym; symmetries=model.symmetries) - forces_nonsym
2-element Vector{StaticArraysCore.SVector{3, Float64}}:
 [-8.005238182409724e-13, 0.004153669530948306, 5.807881302624662e-13]
 [-8.005238182409724e-13, -0.004153669533415452, 5.807881302624662e-13]

(and the same for the basis.symmetries, so that's not the issue). So, assuming our nonsymmetric forces computation is correct (which I think it is), either our symmetry detection or our forces symmetrization routine is wrong...

antoine-levitt avatar Oct 05 '22 07:10 antoine-levitt

I'm a bit stumped there: the symmetry detection should be robust (we get them from spglib, and we recheck them independently), and we debugged the forces symmetrization routine, which agrees with papers. The symmetries do form a group. I remember something about @jaemolihm saying we should discard "accidental" or weird symmetries (meaning symmetries with a w which is not rational or something), but I don't remember what the point was, @jaemolihm do you remember?

antoine-levitt avatar Oct 05 '22 08:10 antoine-levitt

I ran the MWE, I think the issue is different, because sym and nosym results differ even when explicitly setting symmetries to those with simple w.

I find that only the DFTK.TermAtomicNonlocal term contribution is different, so an issue might be there

julia> basis.terms .|> typeof
8-element Vector{DataType}:
 DFTK.TermKinetic
 DFTK.TermAtomicLocal{Array{Float64, 3}}
 DFTK.TermAtomicNonlocal
 DFTK.TermEwald{Float64}
 DFTK.TermPspCorrection{Float64}
 DFTK.TermHartree
 DFTK.TermNoop
 DFTK.TermEntropy


# sym true
forces_per_term = Union{Nothing, Vector{StaticArrays.SVector{3, Float64}}}
nothing
StaticArrays.SVector{3, Float64}[[0.07038869844203988, -2.886579864025407e-15, 0.10032645829629105], [-0.07038869850082263, -2.220446049250313e-15, -0.10032645830509423]]
StaticArrays.SVector{3, Float64}[[-0.1504997004907923, 0.1603602195724699, -0.17022073865414747], [0.15049970053068973, -0.1603602196038185, 0.17022073867694731]]
StaticArrays.SVector{3, Float64}[[-0.04033939191779377, 9.57637842253505e-16, -0.0578011857046174], [0.04033939191779377, -9.576513873907308e-16, 0.0578011857046174]],
nothing, nothing, nothing, nothing]

# sym false
 forces_per_term = Union{Nothing, Vector{StaticArrays.SVector{3, Float64}}}[
 nothing,
 StaticArrays.SVector{3, Float64}[[0.07038869827965999, -4.0044856319809696e-11, 0.1003264579835943], [-0.07038869827688266, -6.477041125663163e-13, -0.10032645800503182]],
 StaticArrays.SVector{3, Float64}[[-0.04773509506391771, 2.6638774519582853e-11, -0.06745613320749312], [0.04773509506303912, -4.547834331347644e-12, 0.06745613321915436]],
 StaticArrays.SVector{3, Float64}[[-0.04033939191779377, 9.57637842253505e-16, -0.0578011857046174], [0.04033939191779377, -9.576513873907308e-16, 0.0578011857046174]],
 nothing, nothing, nothing, nothing]

jaemolihm avatar Oct 05 '22 09:10 jaemolihm

I find that only the DFTK.TermAtomicNonlocal term contribution is different, so an issue might be there

This is because only that term uses symmetrize_forces. The unsymmetrized forces satisfy f = W' * f, not f = W * f, while the symmetrization assues the latter. Maybe another transpose issue..

https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/symmetry.jl#L246

jaemolihm avatar Oct 05 '22 09:10 jaemolihm

Nice detective work! We do make sure that the W is the right real space rotation (spglib.jl:118), so that part should be OK. The formula we have right now seems consistent with (A.27) of https://arxiv.org/pdf/0906.2569.pdf, but probably there's a subtlety, I'll try to rederive it.

antoine-levitt avatar Oct 05 '22 09:10 antoine-levitt

Force means $E(\vec{\delta x}) = \vec{\delta x} \cdot \vec{F}$. Under rotation, $E(W \vec{\delta x}) = (W \vec{\delta x}) \cdot \vec{F}$. Symmetry means $E(\vec{\delta x}) = E(W \vec{\delta x})$. Thus, we have $\vec{\delta x} \cdot \vec{F} = (W \vec{\delta x}) \cdot \vec{F}$. Thus $F_i = F_j W_{ji}$ (in Cartesian coordinates).

jaemolihm avatar Oct 05 '22 09:10 jaemolihm

I agree with this, and it makes sense since forces are covectors. (A.27) says differently, but maybe it's valid in cartesian but not in reduced or something

antoine-levitt avatar Oct 05 '22 09:10 antoine-levitt

Maybe R in the QE paper is S = W', not W?

jaemolihm avatar Oct 05 '22 09:10 jaemolihm

This also works

            f = forces[group[i_other_at]]
            sym_f = covector_cart_to_red(model, matrix_red_to_cart(model, W) * covector_red_to_cart(model, f))
            symmetrized_forces[idx] += sym_f

This does sym_f_red = A^-T A Wred A^-1 A^T Fred, where A is the lattice. It doesn't seem to me that it's equivalent to doing Wred^T Fred, even using the fact that Wcart is orthogonal.

antoine-levitt avatar Oct 05 '22 09:10 antoine-levitt

Maybe R in the QE paper is S = W', not W?

I don't think so, since they write that for every position r, R(r+f) should be an atom of the same type as r, and we check that Wr+w is of the same type as r, so the conventions should be equivalent up to w=Rf (which we do take into account correctly, I hope...)

antoine-levitt avatar Oct 05 '22 10:10 antoine-levitt

This does sym_f_red = A^-T A Wred A^-1 A^T Fred

Another way of writing this is sym_f_red = inv(Wred') f_red. This works also in this example. Your reasoning above says F = W' F, but W' = inv(W), so that F = inv(W) F, and by multiplying by W, F = W F, so it's also consistent with that.

antoine-levitt avatar Oct 05 '22 10:10 antoine-levitt

I did some more tests with the fix proposed in #744 (which is definitely the right direction to follow) and I think the issue with the failing tests has to do with the way we treat symmetries in the response. Because now the value part in the forces seems very reasonable also in the forwarddiff calculation, but the dual part does not look right, e.g. using Gaspard's script

derivative_ε1_fd = [[-0.010188772141717269, 0.5540641933043239, 0.5503864129594938], [0.010187937546130184, -0.5540640837126478, -0.5503858548851172]]
derivative_ε1 = [[1.8376265562014128e6, 5.507626029595309e6, -1.837625369839999e6], [-1.8376253613343276e6, -5.50762642314619e6, 1.8376267549620424e6]]

but when I run it again:

derivative_ε1_fd = [[-0.01018562048691208, 0.5540706243039802, 0.5503838137466783], [0.010186103892827203, -0.5540667950994921, -0.5503840477318837]]
derivative_ε1 = [[0.576337577783623, 0.6292566906509841, 0.6100238361659679], [0.624274131079789, -1.0219749965960532, 0.7693535838844899]]

and without symmetries:

derivative_ε1_fd = [[-0.0101852412196673, 0.5540697222225325, 0.5503834867633052], [0.010185991221832707, -0.5540706391081104, -0.5503840552398427]]
derivative_ε1 = [[-0.010202755281099029, 0.5540729798449112, 0.5503845518212458], [0.010202757408728927, -0.5540730086009197, -0.5503845604364989]]

I lack the time for today to dig further, but at least a step.

mfherbst avatar Oct 05 '22 14:10 mfherbst

Looks terribly also like some uninitialised value or something like that.

mfherbst avatar Oct 05 '22 14:10 mfherbst

... or the perturbation breaks one of the symmetries and we filter that out during the response calculation?

mfherbst avatar Oct 05 '22 16:10 mfherbst

Haven't looked closely but plausibly the dV is zero and we normalize it in chi0 for numerical purposes, which breaks things somehow?

antoine-levitt avatar Oct 05 '22 16:10 antoine-levitt

Ok, so I'm trying to summarize what happens :smile: If I'm not mistaking, we are currently facing two issues :

  1. the first one is the symmetry thing spotted by the script I showed at the top of this thread, see also Michael's post https://github.com/JuliaMolSim/DFTK.jl/issues/742#issuecomment-1268479029, which works if symmetries are disabled but doesn't work otherwise
  2. the second one is the crazy gmres spotted in https://github.com/JuliaMolSim/DFTK.jl/pull/744 where we get too big values in ForwardDiff.

Starting from there, I think have spotted the point 2) bug from https://github.com/JuliaMolSim/DFTK.jl/pull/744. Sorry in advance for the long message! Actually I'm not sure it comes from symmetry but rather from some badly initialized value, as already mentioned. When we solve the Dyson equation here https://github.com/JuliaMolSim/DFTK.jl/blob/353c11f1871adffae8bcbad9aadb95faa929f718/src/response/hessian.jl#L161 δρ0 already has huge norm. This comes from the first application of apply_χ0_4P, where one of the solves of the Sternheimer fails. The Sternheimer tolerance is set to tol/10 where tol in ForwardDiff is scfres.norm_Δρ. That makes tol_sternheimer to 4.12e-12 and the behavior of the Sternheimer is crazy for one band of the gamma point, where the cg gets stuck for 40 iteration at 1e-11 before going up and down again, resulting in a δψk[:,n] with huge norm (see verbose in this gist). I'm not sure I understand why this particular band get stuck with the CG at 1e-11, but here are some tracks to follow.

  • if we set tol_sternheimer to tol instead of tol/10 here https://github.com/JuliaMolSim/DFTK.jl/blob/353c11f1871adffae8bcbad9aadb95faa929f718/src/response/hessian.jl#L129 then the test passes.
  • if we don't use extra bands, it works. This is rather bad news because we are supposed to exactly avoid this type of things with the extra bands :laughing: .
  • this comes from the diagonal matrix Diagonal(εk_extra .- εk[N]) to have small values: for the band for which the Sternheimer fails, the first value on the diagonal is εk[N+1] - εk[N] = 0.00107679, and both eigenvalues are converged enough, with occk[N]>1e-6 and occk[N+1]~1e-7. I would say that the problem is not the Schur (because it works perfectly without the extra bands, and we know it has a worse conditioning number then), but rather the way the inverse of Diagonal(εk_extra .- εk[N]) is applied, which accumulates bigger and bigger numbers during the stagnation of the cg. @antoine-levitt any ideas on how to improve this part to avoid this ?
  • if we set the default occupation to 1e-8 rather than 1e-6, everything works fine.

This does not solve the symmetry issues, but at least we now know where this crazy gmres comes from :upside_down_face:

gkemlin avatar Oct 06 '22 10:10 gkemlin

Regarding the large norm in apply_\chi0 we already do something against this, namely here: https://github.com/JuliaMolSim/DFTK.jl/blob/16c1ef1e214e01ce65a7f67429a0ab31352a84f9/src/response/chi0.jl#L351-L353

It should not be a problem to do something similar in solve_ΩplusK_split as well (or in the forward-diff rule calling it) and I'd not be surprised if this fixes all the issues.

I think most of what you describe is basically treating the symptom: If your input δρ is huge the solver has a tough time to converge to 1e-12 that makes a lot of sense. At some point it might get stuck in some numerical issues because of the range the data types need to represent and then depending on how you tweak a few things you might get around it or not. Symmetry can be one of these tweak parameters.

mfherbst avatar Oct 06 '22 11:10 mfherbst

I'm not sure I see what you mean : If the δρ0 out of the first apply_χ0_4P is completely screwd up, how can we deal with it ? If we divide δρ0 by it norm and then re-multiply with it, this won't solve the issue ?

I think most of what you describe is basically treating the symptom: If your input δρ is huge the solver has a tough time to converge to 1e-12 that makes a lot of sense. At some point it might get stuck in some numerical issues because of the range the data types need to represent and then depending on how you tweak a few things you might get around it or not. Symmetry can be one of these tweak parameters.

Yes, I agree with that.

gkemlin avatar Oct 06 '22 11:10 gkemlin

I'm not sure I see what you mean : If the δρ0 out of the first apply_χ0_4P is completely screwd up, how can we deal with it ?

You deal with it even before ... by scaling the RHS, i.e. the δHψ. That should work, no?

mfherbst avatar Oct 06 '22 11:10 mfherbst

I've normalized the rhs of every sternheimer (ie δHψkn is normalized) and now the CG stagnates at 6e-12 instead of 1e-11 ! The tolerance we're asking from ForwardDiff is 5e-12... I suggest two possibilities:

  • either we agree that it's only numerical issues that are unlikely to happen anywhere else than in our test and we can take 1e-9 for the convergence of the SCF in the test instead of 1e-10, so that ForwardDiff asks for higher tolerance. This seems ok for me because asking for tolerance 1e-10 on the density means that the energy is already at machine precision.
  • either we try to adapt the tolerance of the Sternheimer: this stagnation is likely to occur for the highest occupied band, with occupation number of order 1e-6 by default, so we can set the tolerance in the cg to abstol/sqrt(occ[ik][n]) instead. This won't cause any issue because we multiply by occ[ik][n] afterwards, but this allows for a bit more flexibility on the sternheimer for the last occupied bands.

Both solutions work and seem good to me, I'll be happy to have your opinion on that !

gkemlin avatar Oct 06 '22 13:10 gkemlin

The trace you linked with the residual going down and then up and down is very characteristic of "oversolving": when you solve iteratively Ax=b where A has a nullspace (which is not exactly zero), and you are asking for really tight tolerances, at some point the iterative solver is going to do figure out that there is something nonzero in the nullspace, and it's going to want to invert that (with catastrophic results, of course). We should try to figure out why that happens, let's discuss it sometime. Regarding solutions, projecting out the nullspace might not work, because the nullspace does not necessarily numerically commute with A. I'm not aware of a general solution to the problem, but the QE technique of shifting it away from zero might work...

antoine-levitt avatar Oct 06 '22 18:10 antoine-levitt

Maybe the idea of adapting the Sternheimer tolerance depending on the occupation might at least be a way out. Indeed I would expect more issues close to the Fermi level (e.g. rotations in the p or d orbitals, which could be such a nullspace). I agree we should investigate this more thoroughly, so I would stay very conservative for now. abstol/sqrt(occ[ik][n]) might be a bit too loose though. We should in any case investigate this on some non-trivial examples before introducing it.

mfherbst avatar Oct 06 '22 18:10 mfherbst

That's probably not the nullspace that matters, I meant simply the one from the projection onto the occupied bands. If these are converged to some eps and you solve the sternheimer to less than eps, you're in trouble. It'd be nice to find a proper way to solve this once and for all and not rely on heuristics

antoine-levitt avatar Oct 06 '22 20:10 antoine-levitt

I agree, but perhaps for now we should just ensure in https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/response/hessian.jl#L129 to not overdo it as @gkemlin suggested. I actually did some experiments in #737 and found that tol_sternheimer=tol is ok in the initial and final apply_χ0_4P and only in the inner apply_χ0 I took the minimal tolerance to be tol / 10. I assume that with the rescaling @gkemlin tried should stabilise the solver sufficiently for now.

Other thoughs?

mfherbst avatar Oct 11 '22 08:10 mfherbst