libCEED icon indicating copy to clipboard operation
libCEED copied to clipboard

Comipling on Perlmutter

Open tangqi opened this issue 3 years ago • 8 comments

Can we build libceed on Perlmutter? I am trying to use PrgEnv-nvidia/8.3.3. But this configuration does not work: make configure CUDA_DIR=/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/compilers Maybe I am missing CUDA_ARCH? (In case it matters, I want to build a mpi+gpu mfem with the ceed backend.) Thanks.

Qi

tangqi avatar Aug 07 '22 05:08 tangqi

I just tried this and it linked correctly. I've used PrgEnv-gnu and PrgEnv-aocc in the past.

$ make CUDA_DIR=/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/compilers CC=cc STATIC=1 V=1
make: 'lib' with optional backends:
cc -I./include -O -g      -c -o build/interface/ceed-vector.o /global/u1/j/jedbrow/libCEED/interface/ceed-vector.c
cc -I./include -O -g      -c -o build/interface/ceed-types.o /global/u1/j/jedbrow/libCEED/interface/ceed-types.c
cc -I./include -O -g      -c -o build/interface/ceed-tensor.o /global/u1/j/jedbrow/libCEED/interface/ceed-tensor.c
cc -I./include -O -g      -c -o build/interface/ceed-register.o /global/u1/j/jedbrow/libCEED/interface/ceed-register.c
cc -I./include -O -g      -c -o build/interface/ceed-qfunction-register.o /global/u1/j/jedbrow/libCEED/interface/ceed-qfunction-register.c
cc -I./include -O -g      -c -o build/interface/ceed-qfunctioncontext.o /global/u1/j/jedbrow/libCEED/interface/ceed-qfunctioncontext.c
cc -I./include -O -g      -c -o build/interface/ceed-qfunction.o /global/u1/j/jedbrow/libCEED/interface/ceed-qfunction.c
cc -I./include -O -g      -c -o build/interface/ceed-preconditioning.o /global/u1/j/jedbrow/libCEED/interface/ceed-preconditioning.c
"/global/u1/j/jedbrow/libCEED/interface/ceed-preconditioning.c", line 945: warning: unrecognized GCC pragma
  CeedPragmaOptimizeOff
  ^
[...]

It looks like nvc is defining __GNUC__ to 7, but then not recognizing GCC pragmas. Those warnings are harmless for correctness. If you can figure out what the supported/preferred way to ask nvc to vectorize (omp simd or GCC ivdep semantics) we can update libCEED to handle nvc before it lies to us.

$ module list

Currently Loaded Modules:
  1) craype-x86-milan                       6) xalt/2.10.2       11) cray-mpich/8.1.17
  2) libfabric/1.15.0.0                     7) darshan/3.3.1     12) cray-libsci/21.08.1.2
  3) craype-network-ofi                     8) nvidia/22.5       13) PrgEnv-nvidia/8.3.3
  4) perftools-base/22.06.0                 9) craype/2.7.16
  5) xpmem/2.3.2-2.2_7.5__g93dd7ee.shasta  10) cray-dsmml/0.2.2

jedbrown avatar Aug 07 '22 22:08 jedbrown

BTW, I think nvc inherited some bugs from PGI so you might generally have a better experience using PrgEnv-aocc (AMD's clang, which seems pretty close to upstream clang in behavior). PrgEnv-cray seems to have more mods to upstream clang, but both build libceed shared (default) or static without warnings.

jedbrown avatar Aug 07 '22 22:08 jedbrown

Thanks a lot, I got it compiled, but when I test a ceed example. It gives me the following error:

tangqi@nid001661:/global/cfs/cdirs/m4029/libCEED/examples/ceed> ./ex2-surface -ceed /gpu/cuda
Selected options: [command line option] : <current value>
  Ceed specification [-c] : /gpu/cuda
  Mesh dimension     [-d] : 3
  Mesh degree        [-m] : 4
  Solution degree    [-p] : 4
  Num. 1D quadr. pts [-q] : 6
  Approx. # unknowns [-s] : 262144
  QFunction source   [-g] : header

/global/cfs/cdirs/m4029/tangqi/mfem.gpu/libCEED/backends/ceed-backend-weak.c:17 in CeedInit_Weak(): Backend not currently compiled: /gpu/cuda
Consult the installation instructions to compile this backend
Aborted

It works fine with the cpu flag.

tangqi avatar Aug 09 '22 06:08 tangqi

make info will tell you what it found. I just tried with cuda-11.7 and they (Cray?) moved cuda libraries into a different directory. Maybe @jrwrigh has interacted with this recently. It looks like I have a correct build using

$ make CUDA_DIR=$CUDATOOLKIT_HOME CC=cc CXX=CC

with these modules

Currently Loaded Modules:
  1) craype-x86-milan     4) perftools-base/22.06.0                 7) craype/2.7.16      10) cray-libsci/21.08.1.2  13) darshan/3.3.1            16) cudatoolkit/11.7
  2) libfabric/1.15.0.0   5) xpmem/2.3.2-2.2_7.5__g93dd7ee.shasta   8) cray-dsmml/0.2.2   11) PrgEnv-gnu/8.3.3       14) Nsight-Compute/2022.1.1
  3) craype-network-ofi   6) gcc/11.2.0                             9) cray-mpich/8.1.17  12) xalt/2.10.2            15) Nsight-Systems/2022.2.1

Does that work for you?

jedbrown avatar Aug 19 '22 20:08 jedbrown

My command history has me setting export CUDA_DIR=/global/common/software/m1489/cuda/11.5.0/. Probably worth trying $CUDATOOLKIT_HOME first since it's 11.7 instead of 11.5.

jrwrigh avatar Aug 19 '22 20:08 jrwrigh

Ah, yeah. /global/common/software/m1489/cuda/11.5.0/ is a "normal" CUDA installation that hasn't been broken into undocumented nonstandard bits as part of Cray's "value-add". But the above seems to work with the supported module so long as you link using cc and CC.

jedbrown avatar Aug 19 '22 20:08 jedbrown

@tangqi Does the above work for you or is there something we need to fix?

jedbrown avatar Sep 06 '22 02:09 jedbrown

Sorry for the delay, guys. Perlmutter was not too stable in the past few weeks. I am moving back to testing this in the next week or two.

My immediate goal is to get my mfem mhd code running on mpi + gpu over there (ideally with libceed backend).

tangqi avatar Sep 27 '22 18:09 tangqi

Closing, but re-open if needed

jeremylt avatar Apr 27 '23 17:04 jeremylt