adamantine icon indicating copy to clipboard operation
adamantine copied to clipboard

GPU Issue

Open wd15 opened this issue 10 months ago • 9 comments

The attached input file works with regular MPI build Adamantine.

run.info.txt

However, when running with a cuda built version of Adamantine, it segfaults with

Starting non-ensemble simulation
[rgpu:2205087] *** Process received signal ***
[rgpu:2205087] Signal: Segmentation fault (11)
[rgpu:2205087] Signal code: Address not mapped (1)
[rgpu:2205087] Failing at address: 0x7ffe7fbbd3f8
[rgpu:2205087] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7ff802e2b140]
[rgpu:2205087] [ 1] /opt/adamantine/dealii/lib/libdeal_II.so.9.6.2(+0x4967c58)[0x7ff808093c58]
[rgpu:2205087] [ 2] /opt/adamantine/adamantine/bin/adamantine(_ZN10adamantine17MechanicalPhysicsILi3ELi0ENS_11SolidLiquidEN6dealii11MemorySpace4HostEE10setup_dofsERKNS2_10DoFHandlerILi3ELi3EEERKNS2_13LinearAlgebra11distributed6VectorIdS4_EERKSt6vectorIbSaIbEERKSG_ISt10shared_ptrINS_9BodyForceILi3EEEESaISO_EE+0x487)[0x55657211df97]
[rgpu:2205087] [ 3] /opt/adamantine/adamantine/bin/adamantine(_Z3runILi3ELi0EN10adamantine11SolidLiquidEN6dealii11MemorySpace4HostEESt4pairINS2_13LinearAlgebra11distributed6VectorIdS4_EES9_ERKP19ompi_communicator_tRKN5boost13property_tree11basic_ptreeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESN_St4lessISN_EEERSt6vectorINS0_5TimerESaISU_EE+0x15cd)[0x556571c7bd6d]
[rgpu:2205087] [ 4] /opt/adamantine/adamantine/bin/adamantine(main+0x1b7a)[0x5565719de31a]
[rgpu:2205087] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff802a7ed7a]
[rgpu:2205087] [ 6] /opt/adamantine/adamantine/bin/adamantine(_start+0x2a)[0x5565719f39aa]
[rgpu:2205087] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 2205087 on node rgpu exited on
signal 11 (Segmentation fault).

This is with mechanical true. It runs on the GPU with mechanical false. I understand that the cuda build doesn't support mechanical, but should it segfault or just run with the mechanical calculations on the CPU. Basically, is this behavior expected?

wd15 avatar Mar 11 '25 16:03 wd15

Basically, is this behavior expected?

Yes, it is expected. We should check that you run on the CPU when mechanical is true. Just to be sure, it only fails when you set memory_space to default right? In your input file you have memory_space host which should always work. It is OK to build adamantine with CUDA support and not use it.

Rombur avatar Mar 11 '25 17:03 Rombur

Here is a table of results for my recent GPU build.

Build mechanical memory_space wall time (s)
GPU configured (local non-nix build) false device 12083
GPU configured (local non-nix build) false host 1763
GPU configured (local non-nix build) true host seg-fault
GPU configured (local non-nix build) true device seg-fault
Nix build (not-GPU) true host 2209

It seg-faults for both memory_space options with mechanical. That might indicate an issue with the build. The Nix build works just fine for this problem. Also, when running on the GPU things are really slow. The problem has about 11K cells.

wd15 avatar Mar 12 '25 15:03 wd15

The host build should always work. I'll try to reproduce the issue.

Also, when running on the GPU things are really slow.

The big issue with the GPU is that the heat source is computed on the host at every time step and then moved to the device. We started to work on fixing this issue last year https://github.com/adamantine-sim/adamantine/pull/282 but we ran out of funding before we had time to finish it.

Rombur avatar Mar 12 '25 16:03 Rombur

Thanks for the reply. Good to know the issue regarding the heat source.

wd15 avatar Mar 12 '25 17:03 wd15

@wd15 Let me hijack this issue since you've put some wall times. Part of deal.II can use vector instructions when you compile with -march=native. We cannot use that option in Docker because the user's machine may not support the same instructions. On my local build, I see a 10% speedup for the operator evaluation and I have ideas on to improve that. I wonder if it would make sense to compile deal.II and adamantine with -march=native for the Nix build? I don't know if people cross-compile with Nix or use it to create Docker image, in which case the binary may not run.

Rombur avatar Mar 14 '25 16:03 Rombur

@wd15 Let me hijack this issue since you've put some wall times. Part of deal.II can use vector instructions when you compile with -march=native. We cannot use that option in Docker because the user's machine may not support the same instructions. On my local build, I see a 10% speedup for the operator evaluation and I have ideas on to improve that. I wonder if it would make sense to compile deal.II and adamantine with -march=native for the Nix build?

It can certainly be an option to have. I'll try it out. Should the option be treated like a cmake flag and added here for example?

I don't know if people cross-compile with Nix or use it to create Docker image, in which case the binary may not run.

Nix can be used for cross-compilation and creating Docker images. Let's see if I can get it working with the -march=native flag.

wd15 avatar Mar 14 '25 20:03 wd15

Should the option be treated like a cmake flag and added here for example?

Yes, exactly. You would need to add -DCMAKE_CXX_FLAGS="-march=native" both for the deal.II and the adamantine builds

Rombur avatar Mar 17 '25 13:03 Rombur

Just a note that I don't see any speed up with -march=native on some speed tests. It did build though. That's with building using -march=native on a cluster head node and then deploying to the cluster.

wd15 avatar Mar 20 '25 20:03 wd15

That's pretty disappointing. Thanks for trying it.

Rombur avatar Mar 21 '25 12:03 Rombur