Update CMake to support newer GPU architectures
Summary
I was trying out amr-wind earlier and found its CMake build was unable to configure for Blackwell architecture GPUs:
cmake . -Bbuild [...] -DCMAKE_CUDA_ARCHITECTURES=100
...
...
CMake Error at /usr/share/cmake-4.0/Modules/FindCUDA/select_compute_arch.cmake:245 (message):
Unknown CUDA Architecture Name 10.0 in CUDA_SELECT_NVCC_ARCH_FLAGS
Call Stack (most recent call first):
Tools/CMake/AMReXUtils.cmake:265 (cuda_select_nvcc_arch_flags)
Tools/CMake/AMReXParallelBackends.cmake:99 (set_cuda_architectures)
Src/CMakeLists.txt:40 (include)
It seems that the underlying cause was not in amr-wind itself, but in AMReX's use of some deprecated CMake CUDA features.
This PR makes a small change to the CMake build system to avoid those deprecated features, so that AMReX can compile with Hopper and Blackwell architecture GPUs. The configuration behavior is as follows:
cmake . -DAMReX_GPU_BACKEND=CUDA
will default to the "native" option (which selects architecture based on the hardware present in the machine)
cmake . -DAMReX_GPU_BACKEND=CUDA -DCMAKE_CUDA_ARCHITECTURES=100
builds for the explicitly-specified architecture(s)
Tasks
- [ ] ensure "native" compilation still picks he local GPU, if present, with the same precedence of user hints as before
- [ ] ensure user interface does not break, e.g.,
AMREX_CUDA_ARCHenv hint still works and has the same precendence - [ ] Update docs/logic on Device LTO and avoid to break users.
- [ ] Ensure that for HPC machines, we pre-compile (i.e., at least when AMREX_CUDA_ARCH is set /
CMAKE_CUDA_ARCHITECTURESis selected to a narrow set) down to SASS code, otherwise we will in an MPI context compile every process on startup from PTX to SASS, potentially 10's of thousands of times. - [ ] CUDA 12.9 raises:
Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release
Checklist
The proposed changes:
- [x] fix a bug or incorrect behavior in AMReX
- [x] add new capabilities to AMReX
- [x] are likely to significantly affect the results of downstream AMReX users
- [ ] include documentation in the code and/or rst files, if appropriate
Thank you for this!
Yes #3948 is over-due and it looks like we also have some breakage with the latest CUDA 12.9 and CMake now with the old logic.
I am OoO for the rest of the week, but we should try to get this in for the next release of AMReX and I will try to help next week.
Bumping CMake to 3.24+ globally is fine now, please go ahead.
Please check that we can use the build mode of building for the local "native" GPU when one is discovered, to simplify development. Otherwise, let us keep the AMREX_ARCH env hint (if set) that we use so far to set a default for CMAKE_CUDA_ARCHITECTURES.
@samuelpmish There is a lot of legacy logic in Tools/CMake/AMReXUtils.cmake and others where we wrange CUDA archs and CUDA device LTO flags.
Can you remove/clean those out if you have a chance?
I can help next week, too.
@cyrush can you potentially update the Catalyst image we use in AMReX/WarpX to include CMake 3.24 or newer? :pray:
@c-wetterer-nelson can you potentially update the SENSEI image we use in AMReX/WarpX to include CMake 3.24 or newer? :pray:
Hey Axel, are there still SENSEI users across AMReX/WarpX?
@ax3l our current ascent containers (0.9.4) are using CMake 3.28. The internal layout changed a bit, but I can address that. It looks like you are using Ascent 0.9.2.
Ascent 0.9.3 (also available) is using CMake 3.26.3.
So a quick update to ascent 0.9.3 will get you beyond CMake 3.24, down the road I can help get 0.9.4 working your CI.
@c-wetterer-nelson
Hey Axel, are there still SENSEI users across AMReX/WarpX?
That is a good question, I think in WarpX not anymore.
@WeiqunZhang should we drop SENSEI throughout AMReX & BLAST codes?
I have to move this PR and testing it downstream into the 25.09 release cycle, due to other deadlines.
There is a hotfix for CUDA 12.9 for now in #4589
I added a task list to the PR description on things we will need to carefully check with downstream codes to avoid breakage.
Hey, sorry to disappear for a bit after posting this PR. I'm not sure I understand AMReX's build system enough to address some of the tasks on my own. Can someone help clarify the requirements for the listed tasks:
- ensure "native" compilation still picks he local GPU, if present, with the same precedence of user hints as before
- ensure user interface does not break, e.g., AMREX_CUDA_ARCH env hint still works and has the same precendence
- Update docs/logic on Device LTO and avoid to break users.
- Ensure that for HPC machines, we pre-compile (i.e., at least when AMREX_CUDA_ARCH is set / CMAKE_CUDA_ARCHITECTURES is selected to a narrow set) down to SASS code, otherwise we will in an MPI context compile every process on startup from PTX to SASS, potentially 10's of thousands of times.
- CUDA 12.9 raises: Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release
It seems like some of them are already satisfied (e.g. native compilation picking the local GPU). I believe the -arch=native will generate SASS for the available GPUs already.
Other ones like the warning about sm_75 can be suppressed with a flag, but I'm not sure that's always a good thing (as it hides important info from users with those cards).