amrex
amrex copied to clipboard
Invalid device function with PureSoA sorting
When using PureSoA with cuda 11.4 and calling SortParticlesForDeposition, there is a runtime error in ReorderParticles:
Rank 0 started step 0 at time = 0 with dt = 0
amrex::Abort::0::GPU last error detected in file /home/asinn/amrex/Src/Base/AMReX_GpuLaunchFunctsG.H line 885: invalid device function !!!
SIGABRT
See Backtrace.0 file for details
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 6.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
2: /home/asinn/hipace/build/bin/hipace() [0x48281d]
amrex::Gpu::ErrorCheck(char const*, int) at ??:?
3: /home/asinn/hipace/build/bin/hipace() [0x519f2e]
void amrex::ParticleContainer_impl<amrex::SoAParticle<12, 3>, 12, 3, amrex::ArenaAllocator>::ReorderParticles<unsigned int>(int, amrex::MFIter const&, unsigned int const*) at ??:?
4: /home/asinn/hipace/build/bin/hipace() [0x51b16b]
amrex::ParticleContainer_impl<amrex::SoAParticle<12, 3>, 12, 3, amrex::ArenaAllocator>::SortParticlesForDeposition(amrex::IntVect) at ??:?
Here: https://github.com/AMReX-Codes/amrex/blob/ab567b81cbf1382fc46f8f1c239d89478f72995c/Src/Particle/AMReX_ParticleContainerI.H#L1085-L1149
The cause of this is that nvcc keeps track of extended device lambdas inside a function scope by numbering them. However, it doesn’t handle if constexpr properly causing a mix-up.
This issue is fixed in cuda 11.8.
Alternatively, the AoS kernel lambda needs to be put in another function.