Albany icon indicating copy to clipboard operation
Albany copied to clipboard

Omega_h not compatible with CUDA on Weaver

Open ikalash opened this issue 2 years ago • 20 comments

I turned on Omega_h in the weaver nightlies and it looks like it's not compatible with the CUDA library:

CMake Error at CMakeLists.txt:136 (message):
  CUDA 11.2 does not support Omega_h, use an older or newer version

-- Configuring incomplete, errors occurred!
See also "/projects/albany/nightlyCDashWeaver/build/AlbBuild/tpls/omegah/Omega_h-prefix/src/Omega_h-build/CMakeFiles/CMakeOutput.log".
gmake[2]: *** [CMakeFiles/Omega_h.dir/build.make:92: Omega_h-prefix/src/Omega_h-stamp/Omega_h-configure] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/Omega_h.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

CMake Error at cmake/GetOrInstallOmegah.cmake:115 (message):
  Die
Call Stack (most recent call first):
  CMakeLists.txt:753 (include)

https://sems-cdash-son.sandia.gov/cdash/build/53415/configure

I presume we will just punt on turning on Omega_h on weaver, or is there a different plan?

@jewatkins @mcarlson801

ikalash avatar Oct 20 '23 17:10 ikalash

@cwsmith what versions of cuda are supported?

jewatkins avatar Oct 20 '23 20:10 jewatkins

Hmmm. That check may be a bit conservative now that we have a 'pure' kokkos backend that doesn't rely on thrust; there were thrust bugs in some cuda releases. I'll run a test with the problematic cuda 11.2 and the new backend to confirm.

cwsmith avatar Oct 20 '23 20:10 cwsmith

@jewatkins I'm running tests now (tracked here) and will keep you posted.

cwsmith avatar Oct 23 '23 18:10 cwsmith

@jewatkins CUDA >= 11.4.4 works (with GCC 10.4.0 in this testing, newer/other versions are fine) in my testing.

cwsmith avatar Oct 23 '23 20:10 cwsmith

@ikalash maybe it's best just to turn off omega_h for this build for now since we'll likely transition off of weaver and onto blake. I can test omega_h + cuda there

jewatkins avatar Oct 23 '23 20:10 jewatkins

We can definitely turn it off in the weaver nightlies. I will do this if there are no objections.

How about PM? That one is currently using gcc 11.2.0, which it sounds like might be problematic for omega_h. I was going to tell @mcarlson801 to try turning it on there once we got the weaver ones up, but it sounds like we may have to hold off.

ikalash avatar Oct 23 '23 20:10 ikalash

Why are we turning off weaver? Does blake feature V100 as well? I thought it didn't... Since Summit's life got extended by a year, I think it's best to keep V100 tested somewhere, so if blake does not feature V100, we should prob keep weaver.

bartgol avatar Oct 23 '23 20:10 bartgol

We're not turning off weaver yet, just disabling omega_h. There's issues with the new module set on weaver (I sank a lot of time on it last FY) and there are open tickets which have not been resolved. blake has H100. Plan is to keep weaver online for as long as summit is online or if it takes too much work to maintain.

jewatkins avatar Oct 23 '23 20:10 jewatkins

We can definitely turn it off in the weaver nightlies. I will do this if there are no objections.

How about PM? That one is currently using gcc 11.2.0, which it sounds like might be problematic for omega_h. I was going to tell @mcarlson801 to try turning it on there once we got the weaver ones up, but it sounds like we may have to hold off.

It makes sense to test omega_h on perlmutter so I'd go ahead and try to turn it on there.

jewatkins avatar Oct 23 '23 20:10 jewatkins

It makes sense to test omega_h on perlmutter so I'd go ahead and try to turn it on there.

Could you please try this @mcarlson801 ?

ikalash avatar Oct 23 '23 22:10 ikalash

FYI, he's OOO this week

jewatkins avatar Oct 23 '23 22:10 jewatkins

Thanks for reminding me @jewatkins . It is no rush.

ikalash avatar Oct 24 '23 03:10 ikalash

@jewatkins CUDA >= 11.4.4 works (with GCC 10.4.0 in this testing, newer/other versions are fine) in my testing.

@cwsmith can we remove (or tune better) the check on the version then?

bartgol avatar Oct 24 '23 15:10 bartgol

@bartgol Yeah, I'm going to add this today to cmake and spack.

cwsmith avatar Oct 24 '23 15:10 cwsmith

Please let me know when the fix is pushed and I can re-activate Omega_h in the Weaver nightlies.

ikalash avatar Oct 24 '23 16:10 ikalash

Omega_h v10.8.3 has the fixed cuda check: https://github.com/SCOREC/omega_h/commit/40a2d36d0b747a7147aeed238a0323f40b227cb2 .

cwsmith avatar Oct 24 '23 18:10 cwsmith

Sorry @cwsmith I just saw your comment now. Should I try turning Omega_h on in the weaver builds?

ikalash avatar Nov 08 '23 22:11 ikalash

Ah, I missed this while I was out. I'll try turning it on for Perlmutter as well for this week's test.

mcarlson801 avatar Nov 08 '23 22:11 mcarlson801

Sorry @cwsmith I just saw your comment now. Should I try turning Omega_h on in the weaver builds?

that fix won't let us run w. omega_h on weaver since we're still on cuda 11.2

jewatkins avatar Nov 08 '23 22:11 jewatkins

@jewatkins : you are right. Good call.

ikalash avatar Nov 08 '23 22:11 ikalash