DFTK.jl icon indicating copy to clipboard operation
DFTK.jl copied to clipboard

MPI discussion

Open antoine-levitt opened this issue 5 years ago • 9 comments
trafficstars

We implement kpoint MPI right now. We can also distribute wrt bands, and G/r vectors.

Band is mostly trivial except for the Rayleigh-Ritz/orthogonalization, which needs to be handled specially. The lazy option is to wait for somebody to implement a distributed matrix multiplication in julia. More involved options are to do this manually, or to interface with either Scalapack or Elemental.

G/r vectors is much more annoying. It requires specific logic to handle the distribution of vectors in the ball, conversion to the FFT grid, and use of distributed FFTW.

antoine-levitt avatar Nov 01 '20 14:11 antoine-levitt

I'll try to get some numbers for the current version and then we can see. For now we do in fact suck for cases like Caffeine where there are few k points, few bands and large FFT grids, so that we won't save so much with parallelising over bands (even though it'll probably help).

mfherbst avatar Nov 02 '20 09:11 mfherbst

I think we have to be careful with such reasoning, see the discussion in the threading issue. If we are indeed bandwidth limited, the questions of what level of parallelism we use, what is the overhead and what technology we use are completely irrelevant.

antoine-levitt avatar Nov 02 '20 09:11 antoine-levitt

Yeah. Many factors to consider in fact.

mfherbst avatar Nov 02 '20 09:11 mfherbst

Should we wish to do MPI FFTs, there's https://github.com/eth-cscs/SpFFT (basically QE's FFTs outsourced)

antoine-levitt avatar Nov 19 '20 10:11 antoine-levitt

Oh wow, that looks pretty nice

mfherbst avatar Nov 19 '20 10:11 mfherbst

... and if we want to do MPI over bands there's https://github.com/haampie/COSMA.jl

antoine-levitt avatar Nov 19 '20 10:11 antoine-levitt

New challenger: https://github.com/fverdugo/PartitionedArrays.jl

antoine-levitt avatar Feb 10 '22 08:02 antoine-levitt

Ah no it's sparse stuff

antoine-levitt avatar Feb 10 '22 08:02 antoine-levitt

Interesting could also be https://github.com/eth-cscs/SpFFT. Moving there might also be overcomplicating the code due to the tricks they need to get a good parallelisation. Perhaps a non-variational ansatz is better here, since easier to parallelise.

mfherbst avatar Aug 24 '22 13:08 mfherbst