libCEED
libCEED copied to clipboard
Comipling on Perlmutter
Can we build libceed on Perlmutter? I am trying to use PrgEnv-nvidia/8.3.3. But this configuration does not work:
make configure CUDA_DIR=/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/compilers
Maybe I am missing CUDA_ARCH? (In case it matters, I want to build a mpi+gpu mfem with the ceed backend.) Thanks.
Qi
I just tried this and it linked correctly. I've used PrgEnv-gnu and PrgEnv-aocc in the past.
$ make CUDA_DIR=/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/compilers CC=cc STATIC=1 V=1
make: 'lib' with optional backends:
cc -I./include -O -g -c -o build/interface/ceed-vector.o /global/u1/j/jedbrow/libCEED/interface/ceed-vector.c
cc -I./include -O -g -c -o build/interface/ceed-types.o /global/u1/j/jedbrow/libCEED/interface/ceed-types.c
cc -I./include -O -g -c -o build/interface/ceed-tensor.o /global/u1/j/jedbrow/libCEED/interface/ceed-tensor.c
cc -I./include -O -g -c -o build/interface/ceed-register.o /global/u1/j/jedbrow/libCEED/interface/ceed-register.c
cc -I./include -O -g -c -o build/interface/ceed-qfunction-register.o /global/u1/j/jedbrow/libCEED/interface/ceed-qfunction-register.c
cc -I./include -O -g -c -o build/interface/ceed-qfunctioncontext.o /global/u1/j/jedbrow/libCEED/interface/ceed-qfunctioncontext.c
cc -I./include -O -g -c -o build/interface/ceed-qfunction.o /global/u1/j/jedbrow/libCEED/interface/ceed-qfunction.c
cc -I./include -O -g -c -o build/interface/ceed-preconditioning.o /global/u1/j/jedbrow/libCEED/interface/ceed-preconditioning.c
"/global/u1/j/jedbrow/libCEED/interface/ceed-preconditioning.c", line 945: warning: unrecognized GCC pragma
CeedPragmaOptimizeOff
^
[...]
It looks like nvc is defining __GNUC__ to 7, but then not recognizing GCC pragmas. Those warnings are harmless for correctness. If you can figure out what the supported/preferred way to ask nvc to vectorize (omp simd or GCC ivdep semantics) we can update libCEED to handle nvc before it lies to us.
$ module list
Currently Loaded Modules:
1) craype-x86-milan 6) xalt/2.10.2 11) cray-mpich/8.1.17
2) libfabric/1.15.0.0 7) darshan/3.3.1 12) cray-libsci/21.08.1.2
3) craype-network-ofi 8) nvidia/22.5 13) PrgEnv-nvidia/8.3.3
4) perftools-base/22.06.0 9) craype/2.7.16
5) xpmem/2.3.2-2.2_7.5__g93dd7ee.shasta 10) cray-dsmml/0.2.2
BTW, I think nvc inherited some bugs from PGI so you might generally have a better experience using PrgEnv-aocc (AMD's clang, which seems pretty close to upstream clang in behavior). PrgEnv-cray seems to have more mods to upstream clang, but both build libceed shared (default) or static without warnings.
Thanks a lot, I got it compiled, but when I test a ceed example. It gives me the following error:
tangqi@nid001661:/global/cfs/cdirs/m4029/libCEED/examples/ceed> ./ex2-surface -ceed /gpu/cuda
Selected options: [command line option] : <current value>
Ceed specification [-c] : /gpu/cuda
Mesh dimension [-d] : 3
Mesh degree [-m] : 4
Solution degree [-p] : 4
Num. 1D quadr. pts [-q] : 6
Approx. # unknowns [-s] : 262144
QFunction source [-g] : header
/global/cfs/cdirs/m4029/tangqi/mfem.gpu/libCEED/backends/ceed-backend-weak.c:17 in CeedInit_Weak(): Backend not currently compiled: /gpu/cuda
Consult the installation instructions to compile this backend
Aborted
It works fine with the cpu flag.
make info will tell you what it found. I just tried with cuda-11.7 and they (Cray?) moved cuda libraries into a different directory. Maybe @jrwrigh has interacted with this recently. It looks like I have a correct build using
$ make CUDA_DIR=$CUDATOOLKIT_HOME CC=cc CXX=CC
with these modules
Currently Loaded Modules:
1) craype-x86-milan 4) perftools-base/22.06.0 7) craype/2.7.16 10) cray-libsci/21.08.1.2 13) darshan/3.3.1 16) cudatoolkit/11.7
2) libfabric/1.15.0.0 5) xpmem/2.3.2-2.2_7.5__g93dd7ee.shasta 8) cray-dsmml/0.2.2 11) PrgEnv-gnu/8.3.3 14) Nsight-Compute/2022.1.1
3) craype-network-ofi 6) gcc/11.2.0 9) cray-mpich/8.1.17 12) xalt/2.10.2 15) Nsight-Systems/2022.2.1
Does that work for you?
My command history has me setting export CUDA_DIR=/global/common/software/m1489/cuda/11.5.0/. Probably worth trying $CUDATOOLKIT_HOME first since it's 11.7 instead of 11.5.
Ah, yeah. /global/common/software/m1489/cuda/11.5.0/ is a "normal" CUDA installation that hasn't been broken into undocumented nonstandard bits as part of Cray's "value-add". But the above seems to work with the supported module so long as you link using cc and CC.
@tangqi Does the above work for you or is there something we need to fix?
Sorry for the delay, guys. Perlmutter was not too stable in the past few weeks. I am moving back to testing this in the next week or two.
My immediate goal is to get my mfem mhd code running on mpi + gpu over there (ideally with libceed backend).
Closing, but re-open if needed