spack lammps: use the Cray GTL

When using Cray's MPICH and wanting GPU awareness one should make sure that the GPU Transport Layer (GTL) is linked.

This GTL generally manifest itself through the Cray compiler wrappers (cc/CC/ftn) and is automatically linked when the module like craype-accel-amd-gfx90a (for MI250X) are loaded.

If one does not use the Cray wrappers, a manual GTL link step must be done.

I prepared a fix for the cray-mpich package that adds the facility to get the GTL's path and library name (https://github.com/spack/spack/pull/45830). So this PR is dependent on #45830.

Then, using a classical flag handler, we can inject the library flags into the build. (note that this added flag handler also fixes the issue where hipcc does not get forwarded its ldflag/ldlibs spec flags because it is NOT wrapped by spack https://github.com/spack/spack/issues/45690)

@rbberger

Aug 28 '24 15:08 etiennemlb

Looks promising, I'll test it in the coming days.

Aug 29 '24 02:08 rbberger

Side note, either you explicitly specify the ^cray-mpich+rocm in the lammps spec or we should propagate a +rocm variant to cray-mpich, maybe like so (?): depends_on("cray-mpich+rocm", when="+mpi+rocm ^cray-mpich").

I'm not sure which API would feel/work best, both for getting the gtl libs/using them and to propagate the +rocm to the cray-mpich.

Aug 29 '24 15:08 etiennemlb

I've looked at this and extended it locally to CUDA for testing. Overall, I'm ok with it. In lammps I would add:

depends_on("cray-mpich+cuda", when="+mpi+cuda ^[virtuals=mpi] cray-mpich")
depends_on("cray-mpich+rocm", when="+mpi+rocm ^[virtuals=mpi] cray-mpich")

Note, forkokkos I needed to add a workaround for cray-mpich that is supposed to disable cudaMallocAsync usage since it doesn't seem to work with it. LAMMPS crashes otherwise. And I've recently noticed that the workaround no longer gets applied since it would only get active if kokkos has mpi as a dependency. I apparently tested only with +hwloc which for some time pulled in mpi due to +netloc. So now we need a better solution.

Another corner case that exists is that if you want to use mpi4py you need to have the GTL lib linked into the Python interpreter itself. So far I've injected those flags via ldflags in the python spec. Not sure if there is a better approach that is inspired by this.

Sep 05 '24 07:09 rbberger

The mpi4py issue is nasty. And we should notify the kokkos guys about the mpi workaround issue (@cedricchevalier19, @nmm0).

Could you provide the changes for CUDA?

Sep 05 '24 08:09 etiennemlb

Thanks, @etiennemlb, for pinging us.

On the Kokkos side, to eliminate the non-working workaround, we may add a variant to disable cudaMallocAsync, and perhaps a conflict can be added to cray-mpich to force the correct configuration. Note that it can also be used in other contexts: legacy hardware that does not support cudaMallocAsync or other MPI implementations.

I know that @dalg24 is not terribly excited about exposing IMPL stuff in Spack.

Sep 05 '24 09:09 cedricchevalier19

@etiennemlb I'll give you a commit for your other PR to add CUDA support.

Sep 05 '24 15:09 rbberger

https://github.com/spack/spack/pull/45830 needs to be merged first.

Sep 10 '24 17:09 rbberger

@rbberger do you have any idea who to contact to review #45830 ?

Sep 10 '24 20:09 etiennemlb

@haampie or @alalazo ?

Sep 11 '24 00:09 rbberger