DFTK.jl
DFTK.jl copied to clipboard
Enhance user experience with MPI
This PR aims at enhancing the user experience when using MPI parallelization by not stopping when n_ranks > n_kpt
.
The current solution of stopping execution with an error message is not optimal. Indeed, for an arbitrary system, it is not trivial to know the number of irreducible K-points in advance. This is particularly annoying when calculations are launched in an automated fashion, where the only safe bet is to not use MPI at all.
The solution proposed here is simple. When a calculation is started with more MPI ranks than K-points, a new MPI communicator is created. This communicator has as many ranks as there are K-points, while the remaining processes exit the program.
While this may lead to idling CPU time, I believe that not crashing improves the experience. Moreover, a warning is printed describing the situation to the user, so that they can optimize their next run. I would also argue that this is not worse than under-parallelizing in some instances where the number of K-point is not a multiple of the number of MPI ranks (maybe we should also issue a warning in such a case?).
I also took the opportunity to fix testing with the :mpi
tag. The current test on parse(Bool, get(ENV, "CI", "false"))
in the PlaneWaveBasis
creation, which essentially disables MPI testing on the CI, has been removed. Instead, all tests involving kgrids with a single k-point have received the :dont_test_mpi
tag. The number of MPI ranks for MPI testing is also hardcoded to 2: because all tests are run as a single calculation, killing processes when n_ranks > n_kpt
would make the tests hang. This also fixes local testing via Pkg.test("DFTK"; test_args = ["mpi"])
.