GPU Issue
The attached input file works with regular MPI build Adamantine.
However, when running with a cuda built version of Adamantine, it segfaults with
Starting non-ensemble simulation
[rgpu:2205087] *** Process received signal ***
[rgpu:2205087] Signal: Segmentation fault (11)
[rgpu:2205087] Signal code: Address not mapped (1)
[rgpu:2205087] Failing at address: 0x7ffe7fbbd3f8
[rgpu:2205087] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7ff802e2b140]
[rgpu:2205087] [ 1] /opt/adamantine/dealii/lib/libdeal_II.so.9.6.2(+0x4967c58)[0x7ff808093c58]
[rgpu:2205087] [ 2] /opt/adamantine/adamantine/bin/adamantine(_ZN10adamantine17MechanicalPhysicsILi3ELi0ENS_11SolidLiquidEN6dealii11MemorySpace4HostEE10setup_dofsERKNS2_10DoFHandlerILi3ELi3EEERKNS2_13LinearAlgebra11distributed6VectorIdS4_EERKSt6vectorIbSaIbEERKSG_ISt10shared_ptrINS_9BodyForceILi3EEEESaISO_EE+0x487)[0x55657211df97]
[rgpu:2205087] [ 3] /opt/adamantine/adamantine/bin/adamantine(_Z3runILi3ELi0EN10adamantine11SolidLiquidEN6dealii11MemorySpace4HostEESt4pairINS2_13LinearAlgebra11distributed6VectorIdS4_EES9_ERKP19ompi_communicator_tRKN5boost13property_tree11basic_ptreeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESN_St4lessISN_EEERSt6vectorINS0_5TimerESaISU_EE+0x15cd)[0x556571c7bd6d]
[rgpu:2205087] [ 4] /opt/adamantine/adamantine/bin/adamantine(main+0x1b7a)[0x5565719de31a]
[rgpu:2205087] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff802a7ed7a]
[rgpu:2205087] [ 6] /opt/adamantine/adamantine/bin/adamantine(_start+0x2a)[0x5565719f39aa]
[rgpu:2205087] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 2205087 on node rgpu exited on
signal 11 (Segmentation fault).
This is with mechanical true. It runs on the GPU with mechanical false. I understand that the cuda build doesn't support mechanical, but should it segfault or just run with the mechanical calculations on the CPU. Basically, is this behavior expected?
Basically, is this behavior expected?
Yes, it is expected. We should check that you run on the CPU when mechanical is true. Just to be sure, it only fails when you set memory_space to default right? In your input file you have memory_space host which should always work. It is OK to build adamantine with CUDA support and not use it.
Here is a table of results for my recent GPU build.
| Build | mechanical | memory_space | wall time (s) |
|---|---|---|---|
| GPU configured (local non-nix build) | false | device | 12083 |
| GPU configured (local non-nix build) | false | host | 1763 |
| GPU configured (local non-nix build) | true | host | seg-fault |
| GPU configured (local non-nix build) | true | device | seg-fault |
| Nix build (not-GPU) | true | host | 2209 |
It seg-faults for both memory_space options with mechanical. That might indicate an issue with the build. The Nix build works just fine for this problem. Also, when running on the GPU things are really slow. The problem has about 11K cells.
The host build should always work. I'll try to reproduce the issue.
Also, when running on the GPU things are really slow.
The big issue with the GPU is that the heat source is computed on the host at every time step and then moved to the device. We started to work on fixing this issue last year https://github.com/adamantine-sim/adamantine/pull/282 but we ran out of funding before we had time to finish it.
Thanks for the reply. Good to know the issue regarding the heat source.
@wd15 Let me hijack this issue since you've put some wall times. Part of deal.II can use vector instructions when you compile with -march=native. We cannot use that option in Docker because the user's machine may not support the same instructions. On my local build, I see a 10% speedup for the operator evaluation and I have ideas on to improve that. I wonder if it would make sense to compile deal.II and adamantine with -march=native for the Nix build? I don't know if people cross-compile with Nix or use it to create Docker image, in which case the binary may not run.
@wd15 Let me hijack this issue since you've put some wall times. Part of deal.II can use vector instructions when you compile with
-march=native. We cannot use that option in Docker because the user's machine may not support the same instructions. On my local build, I see a 10% speedup for the operator evaluation and I have ideas on to improve that. I wonder if it would make sense to compile deal.II and adamantine with-march=nativefor the Nix build?
It can certainly be an option to have. I'll try it out. Should the option be treated like a cmake flag and added here for example?
I don't know if people cross-compile with Nix or use it to create Docker image, in which case the binary may not run.
Nix can be used for cross-compilation and creating Docker images. Let's see if I can get it working with the -march=native flag.
Should the option be treated like a cmake flag and added here for example?
Yes, exactly. You would need to add -DCMAKE_CXX_FLAGS="-march=native" both for the deal.II and the adamantine builds
Just a note that I don't see any speed up with -march=native on some speed tests. It did build though. That's with building using -march=native on a cluster head node and then deploying to the cluster.
That's pretty disappointing. Thanks for trying it.