Ph4 produces NaNs in GPU mode
Describe the bug When running the sun_earth_venus.py example using Ph4, everything works fine in CPU mode, but we only get NaNs in GPU mode. Other codes have the same issue.
To Reproduce Steps to reproduce the behavior:
- Modify sun_earth_venus.py to use Ph4 instead of Hermite, and add
mode="gpu".
Expected behavior A clear and concise description of what you expected to happen.
Earth and Venus to go around the sun as intended.
./setup install amuse-ph4 completes without error, but find . -iname "sun_earth_venus.py" returns nothing.
Ah, it's sun_venus_earth.py.
But that script does not seem to import ph4 nor hermite.
I can replace the from amuse.lab import Huayno import, though.
Is that the idea?
Ah, yes, probably. I think when I ran it I compared Hermite with ph4 maybe. It seems that any simple n-body simulation will do to show the problem.
I can reproduce the error to the extent that this modification of the code:
###BOOKLISTSTART2###
def integrate_solar_system(particles, end_time):
from amuse.lab import Huayno, nbody_system
from amuse.community.ph4.interface import ph4
from amuse.community.hermite.interface import Hermite
convert_nbody = nbody_system.nbody_to_si(particles.mass.sum(),
particles[1].position.length())
# gravity = Huayno(convert_nbody)
gravity = ph4(convert_nbody, mode="gpu")
# gravity = Hermite(convert_nbody, mode="gpu")
led to a different plot.
This is the plot with mode="cpu".
While this is the plot with mode="gpu".
Btw, you do need to execute conda install nvidia::cuda-toolkit before you can execute ./setup install amuse-ph4-sapporo.
That first command is not in the instructions afaik.
This command
conda install cuda-art cuda-version=12
is suggested when running ./setup, but that does not suffice.
Yes, those diagrams are also what I am seeing. CUDA installation instructions are at https://amuse.readthedocs.io/en/latest/install/cuda.html, not in the main things. I'm not sure what cuda-art is, is that really what it suggests?
I'm not sure what
cuda-artis, is that really what it suggests?
I have no clue either, but that is indeed suggested.
CUDA installation instructions are at https://amuse.readthedocs.io/en/latest/install/cuda.html, not in the main things
I see, the suggested conda install -c conda-forge cuda-toolkit is indeed close to or the same as conda install nvidia::cuda-toolkit.
Weird, I can't find the string cuda-art anywhere in the AMUSE source.
The difference between those commands is that the former gets CUDA from conda-forge, while the latter gets it from nVidia's own Anaconda channel. Mixing packages from different conda channels tends to cause problems, so the former is preferred.
compute-sanitizer --tool memcheck python scripts/textbook_sun_venus_earth.py
(scripts/textbook_sun_venus_earth.py is a modified version of examples/textbook/sun_venus_earth.py)
gives
========= COMPUTE-SANITIZER
========= Program hit CUDA_ERROR_INVALID_CONTEXT (error 201) due to "invalid device context" on CUDA API call to cuCtxSetFlags.
========= Saved host backtrace up to driver entry point at error
========= Host Frame: [0x26e25c]
========= in /cm/local/apps/cuda-driver/libs/current/lib64/libcuda.so.1
========= Host Frame:uct_cuda_copy_set_ctx_sync_memops in cuda_copy/cuda_copy_md.c:319 [0x9fe0]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../ucx/libuct_cuda.so.0
========= Host Frame:uct_cuda_copy_md_open in cuda_copy/cuda_copy_md.c:1057 [0xa542]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../ucx/libuct_cuda.so.0
========= Host Frame:uct_md_open in base/uct_md.c:61 [0x151a6]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libuct.so.0
========= Host Frame:ucp_add_component_resources in core/ucp_context.c:1642 [0x26a92]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libucp.so.0
========= Host Frame:ucp_fill_resources in core/ucp_context.c:1894 [0x27dff]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libucp.so.0
========= Host Frame:ucp_init_version in core/ucp_context.c:2381 [0x29251]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libucp.so.0
========= Host Frame:mca_pml_ucx_open [0x24b322]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libmpi.so.40
========= Host Frame:mca_base_framework_components_open [0x3ba41]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libopen-pal.so.80
========= Host Frame:mca_pml_base_open [0x246a67]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libmpi.so.40
========= Host Frame:mca_base_framework_open [0x3c5e7]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libopen-pal.so.80
========= Host Frame:ompi_mpi_instance_init_common [0x96bab]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libmpi.so.40
========= Host Frame:ompi_mpi_instance_init [0x978d4]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libmpi.so.40
========= Host Frame:ompi_mpi_init [0x8eeee]
========= in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libmpi.so.40
Aha! So I guess now you're going to figure out how to give it a valid device context?
Edit: Hold on, wait a minute. This error happens during MPI initialisation? That's weird. Maybe some kind of MPI/CUDA initialisation error problem?
After consulting @isazi and @loostrum:
CUDA_ERROR_INVALID_CONTEXT is probably an artefact of running compute-sanitizer on Python code, since running the Python code directly gives no errors.
Most likely, there is something incorrect about the values of the data copied to the GPU, which makes send_fetch_data.cpp a target for inspection.
I will proceed by attaching cuda-gdb to the process.
which makes send_fetch_data.cpp a target for inspection.
Many cudaMemcpy with sizeof statements there, perhaps some error in sizeof?
which makes send_fetch_data.cpp a target for inspection.
On the other hand, that file has been unchanged in the last fourteen years....
@loostrum : Copying data from Python to C++ could be a source of eror, but one would expect that to affect mode="cpu" in the same manner, unless mode="gpu" deploys a different copy of data.
As far as I'm aware, the Python to C++ connection is the same RPC-over-MPI mechanism for all workers.
It seems like there's a way to attach custom debuggers to a worker, but I have no idea how to activate that from the user script.
This is developing into a nice puzzle!
It seems like there's a way to attach custom debuggers to a worker, but I have no idea how to activate that from the user script.
Perhaps try cuda-gdb in the default way first?
Requires changing the NVCC flags, I reckon. We now have
NVCCFLAGS += -D_CONSOLE -D_DEBUG -maxrregcount=32
A "-G" needs to be added, I guess.
@LourensVeen How do you force recompilation?
./setup install ph4 will always do a clean build. If you've done a ./setup develop ph4 then you can go into src/amuse_ph4/ and do a make clean, then make ph4_sapporo_worker to build from scratch.
./setup install ph4will always do a clean build.
But that does not yield a new lib/sapporo_light/libsapporo.so.
Neighter does ./setup install amuse-ph4-sapporo, not even after make clean.
But the Python script completes without error - albeit with the faulty output plot - without that shared object, so I guess it's not important.
Ah, sorry, I'm trying to do five things at the same time on three different software packages, so I'm starting to drop things.
./setup install sapporo_light should do a clean recompile and reinstall of Sapporo Light, and since with the new build system it's now dynamically linked, you shouldn't need to recompile ph4_sapporo_worker if you haven't changed it.
./setup install sapporo_lightshould do a clean recompile and reinstall of Sapporo Light
Indeed! Thanks.
Looking at the deprecation warning:
In file included from sapporo.h:18,
from send_fetch_data.cpp:1:
send_fetch_data.cpp: In member function 'void sapporo::free_cuda_memory(int)':
send_fetch_data.cpp:26:34: warning: 'cudaError_t cudaThreadExit()' is deprecated [-Wdeprecated-declarations]
26 | CUDA_SAFE_CALL(cudaThreadExit());
| ~~~~~~~~~~~~~~^~
sapporo_defs.h:34:21: note: in definition of macro 'CUDA_SAFE_CALL_NO_SYNC'
34 | cudaError err = call; \
| ^~~~
send_fetch_data.cpp:26:5: note: in expansion of macro 'CUDA_SAFE_CALL'
26 | CUDA_SAFE_CALL(cudaThreadExit());
| ^~~~~~~~~~~~~~
In file included from /my/path/totargets/x86_64-linux/include/channel_descriptor.h:61,
from /my/path/totargets/x86_64-linux/include/cuda_runtime.h:94,
from sapporo.h:27:
/my/path/totargets/x86_64-linux/include/cuda_runtime_api.h:1146:57: note: declared here
1146 | extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaThreadExit(void);
| ^~~~~~~~~~~~~~
which leads to an error for CUDA 13, see bug #1193, this does not seem too hard to fix.