GauXC
GauXC copied to clipboard
Support periodic boundary conditions
It would be really useful to support calculations with periodic boundary conditions.
@susilehtola In prep for discussion with CRYSTAL folks, can you drop some references here?
Per the email discussion, a good reference is Towler, Zupan, and Causà, Comput. Phys. Commun. 98, 181 (1996).
Hi, we are currently considering using GauXC to interface with PySCF for periodic boundary conditions. However, we have encountered an issue. GauXC only supports input without K in the Density Matrix (nao, nao), whereas the Density Matrix (k, nao, nao) for periodic boundary conditions contains K points.
After reviewing the code combining PySCF and libXC, this is how they handle it:
- Iterate over K points, generate a rho_ks for each DM[K], and finally contract k to generate rho. https://github.com/pyscf/pyscf/blob/master/pyscf/pbc/dft/numint.py#L1118-L1127
- Then input rho into libxc to calculate exc or vxc, where in RHF, rho_u=rho_d=rho. https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py#L1660-L1669
Therefore, do we have any solutions for GauXC since its interface does not accept DM with K? Have we exposed the interface for rho externally? Or can we construct an equivalent DM?
I think that for interface, it would be much more convenient if we could transmit rho instead of DM, as this would not differentiate between molecular or periodic boundary system. Thanks
@NVIDIA-JerryChen I think you are missing Fourier transforms in-between. In the periodic case, the integrals are always computed in real space, which is where the AO basis set is defined. Given density matrices at every K point, one does the inverse Fourier transform to get density matrices between the cell at the origin and a cell at R, and builds the Fock matrix contributions, and finally goes back into k space with a Fourier transform.
Since screening will depend a lot on R, it seems like R needs to be given as a parameter to the setup phase of GauXC.
Hi @NVIDIA-JerryChen
The notion of separating out the spatial Rho construction is something I've considered in other contexts (PGAS, incremental XC, etc), but I think it would be useful here as well!
The main issue is that rho is ephemeral in the GPU code - since it's only needed at the task (quadrature batch) level, it's discarded between task batches. For very small unit cells (where all D[k] could simultaneously fit in GPU memory and still leave enough room to do the XC integration), I could envision this working the same way, but that's not scalable and I imagine that might even be a bit of an edge case (for large enough K grids, that's going to be an issue for any problem size). So we'll need to commit that to host memory without introducing bottlenecks. The pipelining should be straight forward, but would require some engineering and performance tuning.
I think this could be done relatively easily for the CPU code, and all the quirks with the interface could get worked out there. I'll have a look at the pyscf implementation and consider @susilehtola 's comments. This has been on the roadmap for GauXC for awhile, but is not currently scheduled for execution, I'll see if I can move some things around to get this bumped!
I'll be the first to admit I'm not an expert with PBC, so getting some feedback from the community on the interface would be greatly appreciated here. I'd be happy to work with you and the Pyscf team directly to work out what the most reasonable interface would be.
@wavefunction91 in general one can expect 1-RDM in AO basis (direct- or real-space) to be available in user code already (seemingly true for PySCF and for e.g. my code). So it's best to outsource the transformation of 1-RDM from Bloch-AO to AO basis to the user ... since 1-RDM can be large for large k grids best to avoid duplication of storage, representation dependence, etc..
P.S. AO 1-RDM will already have "normal" nao x nao blocked structure, but for each block rows and cols refer to reference and displaced (by R) unit cells, respectively. Unfortunately for small unit cells you probably don't want to drive GauXC one block at a time, so some PBC logic will leak into GauXC.
@evaleev IMHO it is fine to accept a handicap since one can always just use a larger unit cell, no? In either case the basis set is infinite, and one has to do sums over cells.
@evaleev I don't doubt we're going to have leakage, and I'm totally fine with that. We need this to be as close to hardware as possible. If someone can get me a distilled set of equations to implement (i.e. extracted from their historical resources, I find those papers impossible to parse), this should be relatively straightforward to implement (esp after looking at the Pyscf code @NVIDIA-JerryChen linked, very simple, just requires some refactoring).
I think we could get a beta reference (CPU) implementation without a ton of effort, GPU will take some doing to get right
Here's a recent paper from the Terachem team that might also be useful https://arxiv.org/abs/2410.22278