adamantine GPU Issue

The attached input file works with regular MPI build Adamantine.

run.info.txt

However, when running with a cuda built version of Adamantine, it segfaults with

Starting non-ensemble simulation
[rgpu:2205087] *** Process received signal ***
[rgpu:2205087] Signal: Segmentation fault (11)
[rgpu:2205087] Signal code: Address not mapped (1)
[rgpu:2205087] Failing at address: 0x7ffe7fbbd3f8
[rgpu:2205087] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7ff802e2b140]
[rgpu:2205087] [ 1] /opt/adamantine/dealii/lib/libdeal_II.so.9.6.2(+0x4967c58)[0x7ff808093c58]
[rgpu:2205087] [ 2] /opt/adamantine/adamantine/bin/adamantine(_ZN10adamantine17MechanicalPhysicsILi3ELi0ENS_11SolidLiquidEN6dealii11MemorySpace4HostEE10setup_dofsERKNS2_10DoFHandlerILi3ELi3EEERKNS2_13LinearAlgebra11distributed6VectorIdS4_EERKSt6vectorIbSaIbEERKSG_ISt10shared_ptrINS_9BodyForceILi3EEEESaISO_EE+0x487)[0x55657211df97]
[rgpu:2205087] [ 3] /opt/adamantine/adamantine/bin/adamantine(_Z3runILi3ELi0EN10adamantine11SolidLiquidEN6dealii11MemorySpace4HostEESt4pairINS2_13LinearAlgebra11distributed6VectorIdS4_EES9_ERKP19ompi_communicator_tRKN5boost13property_tree11basic_ptreeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESN_St4lessISN_EEERSt6vectorINS0_5TimerESaISU_EE+0x15cd)[0x556571c7bd6d]
[rgpu:2205087] [ 4] /opt/adamantine/adamantine/bin/adamantine(main+0x1b7a)[0x5565719de31a]
[rgpu:2205087] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff802a7ed7a]
[rgpu:2205087] [ 6] /opt/adamantine/adamantine/bin/adamantine(_start+0x2a)[0x5565719f39aa]
[rgpu:2205087] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 2205087 on node rgpu exited on
signal 11 (Segmentation fault).

This is with mechanical true. It runs on the GPU with mechanical false. I understand that the cuda build doesn't support mechanical, but should it segfault or just run with the mechanical calculations on the CPU. Basically, is this behavior expected?

Mar 11 '25 16:03 wd15

Basically, is this behavior expected?

Yes, it is expected. We should check that you run on the CPU when mechanical is true. Just to be sure, it only fails when you set memory_space to default right? In your input file you have memory_space host which should always work. It is OK to build adamantine with CUDA support and not use it.

Mar 11 '25 17:03 Rombur

Here is a table of results for my recent GPU build.

Build	mechanical	memory_space	wall time (s)
GPU configured (local non-nix build)	false	device	12083
GPU configured (local non-nix build)	false	host	1763
GPU configured (local non-nix build)	true	host	seg-fault
GPU configured (local non-nix build)	true	device	seg-fault
Nix build (not-GPU)	true	host	2209

It seg-faults for both memory_space options with mechanical. That might indicate an issue with the build. The Nix build works just fine for this problem. Also, when running on the GPU things are really slow. The problem has about 11K cells.

Mar 12 '25 15:03 wd15

The host build should always work. I'll try to reproduce the issue.

Also, when running on the GPU things are really slow.

The big issue with the GPU is that the heat source is computed on the host at every time step and then moved to the device. We started to work on fixing this issue last year https://github.com/adamantine-sim/adamantine/pull/282 but we ran out of funding before we had time to finish it.

Mar 12 '25 16:03 Rombur

Thanks for the reply. Good to know the issue regarding the heat source.

Mar 12 '25 17:03 wd15

@wd15 Let me hijack this issue since you've put some wall times. Part of deal.II can use vector instructions when you compile with -march=native. We cannot use that option in Docker because the user's machine may not support the same instructions. On my local build, I see a 10% speedup for the operator evaluation and I have ideas on to improve that. I wonder if it would make sense to compile deal.II and adamantine with -march=native for the Nix build? I don't know if people cross-compile with Nix or use it to create Docker image, in which case the binary may not run.

Mar 14 '25 16:03 Rombur

@wd15 Let me hijack this issue since you've put some wall times. Part of deal.II can use vector instructions when you compile with -march=native. We cannot use that option in Docker because the user's machine may not support the same instructions. On my local build, I see a 10% speedup for the operator evaluation and I have ideas on to improve that. I wonder if it would make sense to compile deal.II and adamantine with -march=native for the Nix build?

It can certainly be an option to have. I'll try it out. Should the option be treated like a cmake flag and added here for example?

I don't know if people cross-compile with Nix or use it to create Docker image, in which case the binary may not run.

Nix can be used for cross-compilation and creating Docker images. Let's see if I can get it working with the -march=native flag.

Mar 14 '25 20:03 wd15

Should the option be treated like a cmake flag and added here for example?

Yes, exactly. You would need to add -DCMAKE_CXX_FLAGS="-march=native" both for the deal.II and the adamantine builds

Mar 17 '25 13:03 Rombur

Just a note that I don't see any speed up with -march=native on some speed tests. It did build though. That's with building using -march=native on a cluster head node and then deploying to the cluster.

Mar 20 '25 20:03 wd15

That's pretty disappointing. Thanks for trying it.

Mar 21 '25 12:03 Rombur