amuse icon indicating copy to clipboard operation
amuse copied to clipboard

Ph4 produces NaNs in GPU mode

Open LourensVeen opened this issue 3 months ago • 22 comments

Describe the bug When running the sun_earth_venus.py example using Ph4, everything works fine in CPU mode, but we only get NaNs in GPU mode. Other codes have the same issue.

To Reproduce Steps to reproduce the behavior:

  • Modify sun_earth_venus.py to use Ph4 instead of Hermite, and add mode="gpu".

Expected behavior A clear and concise description of what you expected to happen.

Earth and Venus to go around the sun as intended.

LourensVeen avatar Nov 18 '25 14:11 LourensVeen

./setup install amuse-ph4 completes without error, but find . -iname "sun_earth_venus.py" returns nothing.

HannoSpreeuw avatar Nov 26 '25 10:11 HannoSpreeuw

Ah, it's sun_venus_earth.py.

HannoSpreeuw avatar Nov 26 '25 10:11 HannoSpreeuw

But that script does not seem to import ph4 nor hermite.

HannoSpreeuw avatar Nov 26 '25 11:11 HannoSpreeuw

I can replace the from amuse.lab import Huayno import, though. Is that the idea?

HannoSpreeuw avatar Nov 26 '25 13:11 HannoSpreeuw

Ah, yes, probably. I think when I ran it I compared Hermite with ph4 maybe. It seems that any simple n-body simulation will do to show the problem.

LourensVeen avatar Nov 26 '25 14:11 LourensVeen

I can reproduce the error to the extent that this modification of the code:

###BOOKLISTSTART2###
def integrate_solar_system(particles, end_time):
    from amuse.lab import Huayno, nbody_system
    from amuse.community.ph4.interface import ph4
    from amuse.community.hermite.interface import Hermite
    convert_nbody = nbody_system.nbody_to_si(particles.mass.sum(),
                                             particles[1].position.length())

    # gravity = Huayno(convert_nbody)
    gravity = ph4(convert_nbody, mode="gpu")
    # gravity = Hermite(convert_nbody, mode="gpu")

led to a different plot.

This is the plot with mode="cpu".

Image

While this is the plot with mode="gpu".

Image

HannoSpreeuw avatar Nov 26 '25 16:11 HannoSpreeuw

Btw, you do need to execute conda install nvidia::cuda-toolkit before you can execute ./setup install amuse-ph4-sapporo.

That first command is not in the instructions afaik.

This command

conda install cuda-art cuda-version=12

is suggested when running ./setup, but that does not suffice.

HannoSpreeuw avatar Nov 26 '25 16:11 HannoSpreeuw

Yes, those diagrams are also what I am seeing. CUDA installation instructions are at https://amuse.readthedocs.io/en/latest/install/cuda.html, not in the main things. I'm not sure what cuda-art is, is that really what it suggests?

LourensVeen avatar Nov 26 '25 16:11 LourensVeen

I'm not sure what cuda-art is, is that really what it suggests?

I have no clue either, but that is indeed suggested.

HannoSpreeuw avatar Nov 26 '25 16:11 HannoSpreeuw

CUDA installation instructions are at https://amuse.readthedocs.io/en/latest/install/cuda.html, not in the main things

I see, the suggested conda install -c conda-forge cuda-toolkit is indeed close to or the same as conda install nvidia::cuda-toolkit.

HannoSpreeuw avatar Nov 26 '25 16:11 HannoSpreeuw

Weird, I can't find the string cuda-art anywhere in the AMUSE source.

The difference between those commands is that the former gets CUDA from conda-forge, while the latter gets it from nVidia's own Anaconda channel. Mixing packages from different conda channels tends to cause problems, so the former is preferred.

LourensVeen avatar Nov 26 '25 16:11 LourensVeen

compute-sanitizer --tool memcheck python scripts/textbook_sun_venus_earth.py (scripts/textbook_sun_venus_earth.py is a modified version of examples/textbook/sun_venus_earth.py) gives

========= COMPUTE-SANITIZER
========= Program hit CUDA_ERROR_INVALID_CONTEXT (error 201) due to "invalid device context" on CUDA API call to cuCtxSetFlags.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x26e25c]
=========                in /cm/local/apps/cuda-driver/libs/current/lib64/libcuda.so.1
=========     Host Frame:uct_cuda_copy_set_ctx_sync_memops in cuda_copy/cuda_copy_md.c:319 [0x9fe0]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../ucx/libuct_cuda.so.0
=========     Host Frame:uct_cuda_copy_md_open in cuda_copy/cuda_copy_md.c:1057 [0xa542]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../ucx/libuct_cuda.so.0
=========     Host Frame:uct_md_open in base/uct_md.c:61 [0x151a6]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libuct.so.0
=========     Host Frame:ucp_add_component_resources in core/ucp_context.c:1642 [0x26a92]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libucp.so.0
=========     Host Frame:ucp_fill_resources in core/ucp_context.c:1894 [0x27dff]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libucp.so.0
=========     Host Frame:ucp_init_version in core/ucp_context.c:2381 [0x29251]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libucp.so.0
=========     Host Frame:mca_pml_ucx_open [0x24b322]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libmpi.so.40
=========     Host Frame:mca_base_framework_components_open [0x3ba41]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libopen-pal.so.80
=========     Host Frame:mca_pml_base_open [0x246a67]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libmpi.so.40
=========     Host Frame:mca_base_framework_open [0x3c5e7]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libopen-pal.so.80
=========     Host Frame:ompi_mpi_instance_init_common [0x96bab]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libmpi.so.40
=========     Host Frame:ompi_mpi_instance_init [0x978d4]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libmpi.so.40
=========     Host Frame:ompi_mpi_init [0x8eeee]
=========                in /my/path/to/lib/python3.14/site-packages/mpi4py/../../../libmpi.so.40

HannoSpreeuw avatar Dec 02 '25 16:12 HannoSpreeuw

Aha! So I guess now you're going to figure out how to give it a valid device context?

Edit: Hold on, wait a minute. This error happens during MPI initialisation? That's weird. Maybe some kind of MPI/CUDA initialisation error problem?

LourensVeen avatar Dec 02 '25 17:12 LourensVeen

After consulting @isazi and @loostrum:

CUDA_ERROR_INVALID_CONTEXT is probably an artefact of running compute-sanitizer on Python code, since running the Python code directly gives no errors.

Most likely, there is something incorrect about the values of the data copied to the GPU, which makes send_fetch_data.cpp a target for inspection.

I will proceed by attaching cuda-gdb to the process.

HannoSpreeuw avatar Dec 03 '25 10:12 HannoSpreeuw

which makes send_fetch_data.cpp a target for inspection.

Many cudaMemcpy with sizeof statements there, perhaps some error in sizeof?

HannoSpreeuw avatar Dec 03 '25 10:12 HannoSpreeuw

which makes send_fetch_data.cpp a target for inspection.

On the other hand, that file has been unchanged in the last fourteen years....

@loostrum : Copying data from Python to C++ could be a source of eror, but one would expect that to affect mode="cpu" in the same manner, unless mode="gpu" deploys a different copy of data.

HannoSpreeuw avatar Dec 03 '25 10:12 HannoSpreeuw

As far as I'm aware, the Python to C++ connection is the same RPC-over-MPI mechanism for all workers.

It seems like there's a way to attach custom debuggers to a worker, but I have no idea how to activate that from the user script.

LourensVeen avatar Dec 03 '25 10:12 LourensVeen

This is developing into a nice puzzle!

It seems like there's a way to attach custom debuggers to a worker, but I have no idea how to activate that from the user script.

Perhaps try cuda-gdb in the default way first? Requires changing the NVCC flags, I reckon. We now have

NVCCFLAGS += -D_CONSOLE -D_DEBUG -maxrregcount=32

A "-G" needs to be added, I guess.

@LourensVeen How do you force recompilation?

HannoSpreeuw avatar Dec 03 '25 10:12 HannoSpreeuw

./setup install ph4 will always do a clean build. If you've done a ./setup develop ph4 then you can go into src/amuse_ph4/ and do a make clean, then make ph4_sapporo_worker to build from scratch.

LourensVeen avatar Dec 03 '25 10:12 LourensVeen

./setup install ph4 will always do a clean build.

But that does not yield a new lib/sapporo_light/libsapporo.so. Neighter does ./setup install amuse-ph4-sapporo, not even after make clean.

But the Python script completes without error - albeit with the faulty output plot - without that shared object, so I guess it's not important.

HannoSpreeuw avatar Dec 03 '25 13:12 HannoSpreeuw

Ah, sorry, I'm trying to do five things at the same time on three different software packages, so I'm starting to drop things.

./setup install sapporo_light should do a clean recompile and reinstall of Sapporo Light, and since with the new build system it's now dynamically linked, you shouldn't need to recompile ph4_sapporo_worker if you haven't changed it.

LourensVeen avatar Dec 03 '25 15:12 LourensVeen

./setup install sapporo_light should do a clean recompile and reinstall of Sapporo Light

Indeed! Thanks.

Looking at the deprecation warning:

In file included from sapporo.h:18,
                 from send_fetch_data.cpp:1:
send_fetch_data.cpp: In member function 'void sapporo::free_cuda_memory(int)':
send_fetch_data.cpp:26:34: warning: 'cudaError_t cudaThreadExit()' is deprecated [-Wdeprecated-declarations]
   26 |     CUDA_SAFE_CALL(cudaThreadExit());
      |                    ~~~~~~~~~~~~~~^~
sapporo_defs.h:34:21: note: in definition of macro 'CUDA_SAFE_CALL_NO_SYNC'
   34 |     cudaError err = call;                                                    \
      |                     ^~~~
send_fetch_data.cpp:26:5: note: in expansion of macro 'CUDA_SAFE_CALL'
   26 |     CUDA_SAFE_CALL(cudaThreadExit());
      |     ^~~~~~~~~~~~~~
In file included from /my/path/totargets/x86_64-linux/include/channel_descriptor.h:61,
                 from /my/path/totargets/x86_64-linux/include/cuda_runtime.h:94,
                 from sapporo.h:27:
/my/path/totargets/x86_64-linux/include/cuda_runtime_api.h:1146:57: note: declared here
 1146 | extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaThreadExit(void);
      |                                                         ^~~~~~~~~~~~~~

which leads to an error for CUDA 13, see bug #1193, this does not seem too hard to fix.

HannoSpreeuw avatar Dec 03 '25 16:12 HannoSpreeuw