DFTK.jl icon indicating copy to clipboard operation
DFTK.jl copied to clipboard

Finite temperature Newton & AD

Open antoine-levitt opened this issue 4 years ago • 19 comments
trafficstars

cc @gkemlin @niklasschmitz

DFT is all about the mapping Hext -> P. For AD/Newton we need the map dHext -> dP. Conceptually dP = (Ω+K)^-1 dHext. This is feasible in pw because 1) we don't need the full Hext but only its action on the occupied ψ 2) we don't need the full dP but a "compressed" representation. At zero temperature dP is represented by a set dψ orthogonal to ψ. At finite temperature this is a set (dψ, docc, dεF).

Right now in the code (once we merge the newton PR) we have

  • dHext -> dP at 0K by direct inversion of Ω+K (newton.jl)
  • dV -> dρ at any temp by splitting of Ω+K (chi0.jl and the dielectric example)

What we need to do is

  • [x] reorganize the chi0.jl code in a more general framework: instead of taking dV, we take dHext applied to the occupied orbitals, and instead of outputing dρ we output (dψ,docc,dεF)
  • [x] use this to solve Omega+K by splitting
  • [ ] implement an efficient method to solve (Ω+K) at finite temp without splitting, based on the insights of the chi0 (this itself is a new way of doing DFPT and a first step towards a more natural direct minimization solver at finite temp, and therefore interesting in its own right)
  • [ ] merge the two functionalities by having a single dHext->dP interface, with two solvers (split and non-split)

Does that sound reasonable? I'll take a look at doing it when I have time.

antoine-levitt avatar Jun 25 '21 06:06 antoine-levitt

The "hard" work of dHtot->dpsi is done, I'll push it when the newton PR is in. The plan is to replace completely the chi0 computation by splitting up the steps dV -> dHtot -> dpsi -> drho. It'll be slightly less optimized, but that should not be too bad. Afterwards it's just a bit of self-consistency and that'll get us dHext -> dpsi, which is the step needed for AD. Newton at finite temperature might be a bit more tricky, we'll see later.

antoine-levitt avatar Jun 28 '21 15:06 antoine-levitt

If that would work, that would be amazing :smile:.

mfherbst avatar Jun 28 '21 15:06 mfherbst

OK, so probably I declared victory too early, and the "diagonal" representation of occ and psi is very unstable at finite temperature. I'm still giving myself a few days of playing with it to see if I can make it work. Otherwise I think we'll have to give up the idea of representing state this way and move to a version where occ is a matrix (representing the density matrix in the basis of the psi), kind of like what those direct minimization at finite temperature papers do. This is basically giving up on the fact that the psi are Rayleigh-Ritzed. This would be fairly annoying but not the end of the world, and might actually be more natural in some respects. With clever dispatch we can possibly accomodate the case where occ is sometimes diagonal for optimization of some operations (eg density computation)

antoine-levitt avatar Jun 29 '21 15:06 antoine-levitt

The basic issue is that, at finite temperature small variations in the density matrix (such as those produced by a small variation in the hamiltonian) can produce large variation in the (occ, psi) representation. Still have to understand this properly.

antoine-levitt avatar Jun 29 '21 15:06 antoine-levitt

OK so here's the issue. Consider H = [E1 0;0 E2], and dH = [0 a;a 0]. dP = (f1-f2)/(E1-E2) a [0 1;1 0] which is of order 1 no matter the value of E1 and E2. On the other hand, dpsi1 = [0;a/(E1-E2)] which can be arbitrarily large when E1 and E2 are close together, and gets cancelled in dP = f1 psi1 dpsi1* + f2 psi2 dpsi2* + hc. So if we do this naively we drown the useful information in the dpsi by numerical noise. This does not matter for the current chi0.jl because there we accumulate both the contribution from psi2 to dpsi1 and the contribution from psi1 to dpsi2 at the same time (a factor (f1-f2)/(E1-E2), which we compute stably)

We could just ignore this contribution (this is basically what we do at zero temperature), but then we would miss out on the relevant contribution to the density matrix, which is nonzero for eigenvalues near the Fermi level. So essentially there is no way to treat repeated eigenvalues near the Fermi level. A hack is to just ignore these contributions; this essentially corresponds to assuming that the "a" coefficient above is zero, which means that dH does not break the symmetries of H that led to repeated eigenvalues in the first place. This could work, but is going to give wrong results in some cases, and will not be very robust to numerical noise.

So, we need to give up on occ being a vector and assume a matrix. This is very annoying. Hm.

antoine-levitt avatar Jun 30 '21 15:06 antoine-levitt

Not sure I get this, but if occ is a matrix it's basically the KS orbital representation of the density matrix, right? That's surely not so great, but not too bad either is it?

And of course it sort of gives you the opportunity to think of this along the lines of "quasi-degenerate perturbation theory", which is essentially what @jaemolihm suggested on gitter, I suppose (I can't find the reference, so I can't check). There the idea is that for the states close in eigenvalue you form linear combinations that break the diagonality of the occupation matrix, but which make the sum-over-states numerically more well-behaved.

mfherbst avatar Jul 01 '21 06:07 mfherbst

Hi @mfherbst, I'll write here what I said on gitter for future reference. In Eq.(68) of Baroni et al, Rev. Mod. Phys. (2001), drho is written with ordinary dpsi [Eq.(28)] which diverges when there is a degeneracy around the Fermi level. In contrast, Eq.(70,71) is equivalent to (the first term of) Eq.(68), and the modified dpsi [Eq.(72)] does not diverge. This means that with the modified dpsi, dpsi is never divergent, while docc is also diagonal.

jaemolihm avatar Jul 01 '21 06:07 jaemolihm

Thanks!! I knew I should have read that paper in full :-) I have to digest their trick, which looks very clever indeed. They only talk about the density though, which is easy to correct for by writing out the sum over states, so that you never need to form dpsi explicitly. More involved are the nonlocal terms, for which you need a dpsi, I have to look into whether the same trick can be applied there.

antoine-levitt avatar Jul 01 '21 07:07 antoine-levitt

Seems like that works, you just have to do the same trick as (70) but for the density matrix. So essentially it looks like we might just need to replace dpsi by 1/fn Delta psi, (to keep a formula deltarho = sum of fn psi_n dpsi_n) where Delta psi solves (72), and we can keep exactly the same structure for everything else. @jaemolihm does that seem reasonable? It just seems too magical, I have to try it out and we'll see.

In any case thank you very very much @jaemolihm ! It seems that to solve a problem you just have to think about it publicly on github and somebody will come and hand you the solution :-) Hm, I wonder if the real part of every nontrivial zero of the Riemann function is 1/2...

antoine-levitt avatar Jul 01 '21 07:07 antoine-levitt

Hi @antoine-levitt , yes it seems reasonable. Indeed I was expecting the same thing. Happy that it helped!

jaemolihm avatar Jul 01 '21 08:07 jaemolihm

So it's a bit hidden in the algebra but I think the main reason why this works is that it (selectively, for the partially occupied orbitals) relaxes the orthogonality condition <dpsin|psim> + <psin|dpsim> = 0, which is indeed what I was thinking yesterday we should try. In the standard way you have to include a very large contribution Amn = -Anm = 1/(en-em), just so you can have it cancel in fn Amn + fm Anm = (fn-fm)Amn = (fn-fm)/(en-em). If you relax Amn = -Anm you can tweak it so it gives the correct result without being large. Another way of saying this is that instead of having non-diagonal docc so you can have non-diagonal contributions in dP = psi docc psi*, you just do (psi docc) psi* and call psi docc = dpsi. Very nice. Still have to figure out how to do this properly, but this is clearly the way to go.

antoine-levitt avatar Jul 01 '21 09:07 antoine-levitt

@haampie you wondered at some point why there was no projection at each step in the QE Sternheimer solver. I think this is probably why: they shift the spectrum upwards instead of projecting the occupied states out (eq 72 of https://arxiv.org/pdf/cond-mat/0012092.pdf)

antoine-levitt avatar Jul 01 '21 11:07 antoine-levitt

Yeah, I think you're not supposed to derive eqn 72 starting from the perturbation equation and applying (skew?) projectors... It seems as if that equation is constructed such that the solutions Δψₙ happen to make eqn 71 equal to 70.

Similarly equation 30 for the insulator case, it's just a fabricated equation such that its solutions satisfy (ψₘ, Δψₙ) = 0 when m is an occupied state and (ψₘ, Δψₙ) = (ψₘ, ΔVΔψₙ) / (εₙ - εₘ) when m is unoccupied. And in eqn 29 they have an explicit expression for the perturbation of the density, and they note that you can skip the sum over m for occupied states since they cancel -- so solutions of equation 30 drop these terms from the sum so to say.

haampie avatar Jul 01 '21 14:07 haampie

Yeah, eq 72 is a bit of a hack, I don't think I'm going to do it this way. But the point is that you don't necessarily have to keep the variation orthogonal to the occupied space, which might be convenient in large-scale applications to avoid projecting too much

antoine-levitt avatar Jul 01 '21 15:07 antoine-levitt

Assume two levels 1 and 2 for simplicity. The goal is to get a (real) weight α12 for <ψ1|δψ2> = α12 <ψ1|δH|ψ2>. To give the correct contribution to the density matrix, it must satisfy

f2 α12 + f1 α21 = (f1-f2)/(ε1-ε2)

Usually we impose orthogonality: <ψ1|ψ2> = 0 so α12 = - α21, giving α12 = 1/(ε2-ε1) = - α21. If we instead say that we want to minimize α12^2 + α21^2, we get

a12 = λ f2, α12 = λ f1

Plugging into the equation,

λ = (f1-f2)/(ε1-ε2) / (f1^2 + f2^2)

In the case f1=1, f2=0, λ = 1/(ε1-ε2), α12 = 0, α21 = 1/(ε1-ε2) and we recover the insulating case. Similarly in the case f1=0, f2=1. α does not blow up when ε1 and ε2 are close, provided that f1 and f2 are not both small, but in that case the contribution to the DM is small, and so it is not super important to be very accurate. I will do a more quantitative error analysis to find out at which point we should switch scheme, but this seems simpler and less arbitrary than the scheme in the DFPT paper so I'll implement that which should be pretty stable, unless there's something I missed.

antoine-levitt avatar Jul 01 '21 18:07 antoine-levitt

Alternative idea so I don't forget it tomorrow: minimize instead f1^2 a12^2 + f2^2 f21^2 edit: yup, that's clearly the way to go, it gets rid of the issue for f1 and f2 small. It does not lend itself to a Sternheimer treatment though, so need to think about that more properly - but for now I'll just do something and PR it. edit edit: nope that's crap, the previous version seems nicer.

antoine-levitt avatar Jul 01 '21 18:07 antoine-levitt

OK, in my (limited) tests it works very well with the minimization of alpha12^2 + alpha21^2, so I'm going with that.

antoine-levitt avatar Jul 02 '21 09:07 antoine-levitt

Finite-temperature AD is solved now. What's left is integrating that into Newton. Am I right @gkemlin ?

mfherbst avatar Oct 04 '22 06:10 mfherbst

For completeness we need the non split omega+k solver but that's tricky

antoine-levitt avatar Oct 04 '22 07:10 antoine-levitt