DFTK.jl Enhance user experience with MPI

trafficstars

This PR aims at enhancing the user experience when using MPI parallelization by not stopping when n_ranks > n_kpt.

The current solution of stopping execution with an error message is not optimal. Indeed, for an arbitrary system, it is not trivial to know the number of irreducible K-points in advance. This is particularly annoying when calculations are launched in an automated fashion, where the only safe bet is to not use MPI at all.

The solution proposed here is simple. When a calculation is started with more MPI ranks than K-points, a new MPI communicator is created. This communicator has as many ranks as there are K-points, while the remaining processes exit the program.

While this may lead to idling CPU time, I believe that not crashing improves the experience. Moreover, a warning is printed describing the situation to the user, so that they can optimize their next run. I would also argue that this is not worse than under-parallelizing in some instances where the number of K-point is not a multiple of the number of MPI ranks (maybe we should also issue a warning in such a case?).

I also took the opportunity to fix testing with the :mpi tag. The current test on parse(Bool, get(ENV, "CI", "false")) in the PlaneWaveBasis creation, which essentially disables MPI testing on the CI, has been removed. Instead, all tests involving kgrids with a single k-point have received the :dont_test_mpi tag. The number of MPI ranks for MPI testing is also hardcoded to 2: because all tests are run as a single calculation, killing processes when n_ranks > n_kpt would make the tests hang. This also fixes local testing via Pkg.test("DFTK"; test_args = ["mpi"]).

Aug 21 '24 11:08 abussy

BTW the test failures in the "normal" tests seem fake (because of a too tightly chosen test tolerance, I think), but the cancelled MPI test points to an issue in your implementation (could be related to the exit() btw.)

Aug 21 '24 18:08 mfherbst

Calling exit from within PlaneWaveBasis is super unexpected and can easily lead to spurious bugs. I'm thinking of a case where one uses MPI for DFTK and to run other things and all of a sudden part of the processes in the global communicator are just gone and the program hangs or does undefined stuff.

Yes, I agree actually. In fact, I was facing this exact problem when trying to run the tests with more than 2 MPI ranks.

Ideally, for a robust behavior, the extra MPI ranks should not exit the program, but wait at a synchronization barrier at the end of execution. I am not quite sure how to implement that in a light way, without an arbitrarily large if block after the creation of the sub-communicator. I will give it some thoughts.

There are 2 important questions to be addressed on the form I think:

Should an input script look any different when a parallel run is intended?
Should the user have to think about creating a sub-communicator or setup parallelization when writing their input?

I would tend to say no to both of the above. I'll try to come up with something along these lines.

Aug 22 '24 08:08 abussy

The latest commit brings major simplifications to this MPI issue. Instead of building sub-communicators and killing processes in a dangerous fashion, I propose to duplicate some k-points over the empty MPI ranks. The weights are adjusted in order to keep exactness of the results. A warning is issued as well.

This is the lightest fix I could think of. It is also safe and robust, and the user does not have to think about parallelism in their script. The only potential issue I see is: if the user queries for the k-point mesh after the calculation, they could get a non irreducible k-mesh with duplicates. One possible solution would be to add an extra field to the basis with the original mesh, or write a function that always returns the irreducible mesh.

Regarding the tests: I had in mind for a long time to actually switch the logic. Instead of "disabling" MPI tests where needed I think it's better to annotate those, which do support MPI instead. (I.e. instead of a :dont_test_mpi have a simple :mpi on the counter-set). If it's too much work, leave it as is and we do a follow-up ...

I don't mind changing from :dont_test_mpi to :mpi tags in the tests. However, if this PR is accepted, I believe all tests could actually be run in parallel (to be checked by me). In such a case, we could get rid of this tag completely.

BTW the test failures in the "normal" tests seem fake (because of a too tightly chosen test tolerance, I think), but the cancelled MPI test points to an issue in your implementation (could be related to the exit() btw.)

In my experience, it seems that the anderson.jl test is particularly unstable.

Aug 23 '24 11:08 abussy

I'm in favour of this. @abussy Please update and then we can merge this !

@antoine-levitt Objections ?

Oct 23 '24 09:10 mfherbst

@abussy The only thing that we have to be careful with is unfolding the k-point mesh when not using symmetries and during IO operations.

Oct 23 '24 09:10 mfherbst

I haven't looked in details but it looks reasonable, go ahead!

Oct 23 '24 15:10 antoine-levitt

Notes:

Added a n_irreducible_kpoints filed to the PlaneWaveBasis --> used in I/O so that only irreducible k-points are considered
Added irreducible_kcoords() and irreducible_kweights() functions for the same reasons
BZ unfolding currently not implemented with MPI, so duplicated k-point won't affect it. Added an extra error message in case MPI support gets implemented
Rebased for compatibility with master branch

Oct 25 '24 12:10 abussy

Implemented most concerns raised by @mfherbst during his review. I'd like to raise two points for discussion:

I gave some thoughts to changing the :dont_test_mpi test tag to :mpi. I think that, by default, DFTK should be MPI compatible. Any method of function not implemented for MPI is then an exception, and using the :dont_test_mpi test flag makes that clear. I am afraid that changing to :mpi instead would lead to a lower impact of the tag.
When reworking the irreducible_kweights_global() function in PlaneWaveBasis.jl, I realised that in some cases, duplicated k-points already exist (even in non-parallel runs), and I refrained from using functions such as findall(). In particular, when calculating band structures with the compute_bands() function, the K-point path might go through special points multiple times, on purpose. As a result, I kept the logic of looking for replicated k-points in basis.kcoords_global[basis.n_irreducible_kpoints+1:end] explicitly. I might possibly change nomenclature though, as the original set of K-points is not necessarily the irreducible one.

Oct 30 '24 12:10 abussy

I gave some thoughts to changing the :dont_test_mpi test tag to :mpi. I think that, by default, DFTK should be MPI compatible. Any method of function not implemented for MPI is then an exception, and using the :dont_test_mpi test flag makes that clear. I am afraid that changing to :mpi instead would lead to a lower impact of the tag.

Hmm. Yes, but I think that's ok, because it's fine if only the core is tested with MPI and not the rest. But let's think about this some more and leave as is for now.

, I realised that in some cases, duplicated k-points already exist (even in non-parallel runs), ... In particular, when calculating band structures with the compute_bands() function,

True, good point.

~But still there is a problem. The coordinates can be the same if the spin is different. I'm not sure you handle this proplerly right now ?~

Update: Ah no, this is before the entire spin business.

Oct 30 '24 19:10 mfherbst

DFTK.jl DFTK.jl copied to clipboard

Enhance user experience with MPI

DFTK.jl
DFTK.jl copied to clipboard