GEOS
GEOS copied to clipboard
Update CI platforms and compilers
We need to update CI platforms and compilers.
-
On LLNL/Quartz we have:
- gcc12
- clang14
-
On LLNL/Lassen we have:
- gcc8
- clang14
- cuda10
- cuda11
- cuda12
-
On TotalEnergies/Pangea3 we have:
- gcc8
- gcc9
- cuda10
- cuda11
-
On Frontier we have:
- clang??
- rocm??
-
On ElCap we will have:
- clang16??
- rocm ??
Proposed Permutations:
-
ubuntu22
- gcc11
- gcc11 + cuda11
- clang14
- clang14 + cuda11
-
TOSS4 (built on RHEL8.8)
- Which linux distribution? ubi8.8?
- gcc12
- gcc12 - cuda12
- clang15
- clang15 - cuda12
-
TotalEnergies/Cypress
- gcc8
- gcc10
- gcc12
- cuda10
- cuda11
- cuda12
Chevron is currently using
-
GCC 11.2 and 11.4 (well tested and broadly used)
-
GCC 13.2 (not as extensively tested but no build or run failures so far)
-
OpenMPI HPC-X (v14.1 in broad use and some v17.1)
For GPU GEOS we have been using
- CUDA 11.2, 11.4 : GEOS stopped building with GCC 11.x passed commit point 95aea4cb2 (we'll be testing with GCC 12.x)
- HPC-X v14.1 (mostly)
For GPU GEOS we have been using
- CUDA 11.2, 11.4 : GEOS stopped building with GCC 11.x passed commit point 95aea4c (we'll be testing with GCC 12.x)
I bet there's an issue there
GEOS_HOST_DEVICE
virtual real64 getShearModulus( localIndex const k ) const override final
{
return std::max( std::max( m_c44[k], m_c55[k] ), m_c66[k] );
}
you can't call a std
function on device
so its normal. This should not have been merged. You should use LvArray::math::max
instead. See https://github.com/GEOS-DEV/GEOS/pull/2927 e.g.
@CusiniM Maybe should we be stricter on our review process? And also, I do not understand how it can go through the CI. Maybe some over-relaxed compilations parameters?
For GPU GEOS we have been using
- CUDA 11.2, 11.4 : GEOS stopped building with GCC 11.x passed commit point 95aea4c (we'll be testing with GCC 12.x)
I bet there's an issue there
GEOS_HOST_DEVICE virtual real64 getShearModulus( localIndex const k ) const override final { return std::max( std::max( m_c44[k], m_c55[k] ), m_c66[k] ); }
you can't call a
std
function ondevice
so its normal. This should not have been merged. You should useLvArray::math::max
instead. See #2927 e.g. @CusiniM Maybe should we be stricter on our review process? And also, I do not understand how it can go through the CI. Maybe some over-relaxed compilations parameters?
How it passed the CI beats me...
For GPU GEOS we have been using
- CUDA 11.2, 11.4 : GEOS stopped building with GCC 11.x passed commit point 95aea4c (we'll be testing with GCC 12.x)
I bet there's an issue there
GEOS_HOST_DEVICE virtual real64 getShearModulus( localIndex const k ) const override final { return std::max( std::max( m_c44[k], m_c55[k] ), m_c66[k] ); }
you can't call a
std
function ondevice
so its normal. This should not have been merged. You should useLvArray::math::max
instead. See #2927 e.g. @CusiniM Maybe should we be stricter on our review process? And also, I do not understand how it can go through the CI. Maybe some over-relaxed compilations parameters?
Thanks, Thomas, but those specific seems to be already fixed in https://github.com/GEOS-DEV/GEOS/pull/2812, but build still fails for Michael.
Here is where the build process (host comp: GCC11.X and 12.X) fails at the link stage:
Consolidate compiler generated dependencies of target testToolchain
make[3]: Leaving directory `/dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo'
make -f coreComponents/unitTests/toolchain/CMakeFiles/testToolchain.dir/build.make coreComponents/unitTests/toolchain/CMakeFiles/testToolchain.dir/build
make[3]: Entering directory `/dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo'
[100%] Linking CUDA device code CMakeFiles/testToolchain.dir/cmake_device_link.o
cd /dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/coreComponents/unitTests/toolchain && /data/saet/mtml/software/x86_64/cmake-3.24.1-linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/testToolchain.dir/dlink.txt --verbose=1
/vend/nvidia/cuda/v12.2/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/saet/mtml/software/x86_64/RHEL7/hpcx-v2.17-gcc-mlnx_ofed-redhat7-cuda12-x86_64/ompi/bin/mpic++ -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -g -lineinfo -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -O3 -DNDEBUG -Xcompiler -DNDEBUG -Xcompiler -Ofast --generate-code=arch=compute_80,code=[compute_80,sm_80] -Xcompiler=-fopenmp -Xcompiler=-L/vend/nvidia/cuda/v12.2/lib64 -Xlinker=-rpath -Xlinker=/data/saet/mtml/software/x86_64/RHEL7/hpcx-v2.17-gcc-mlnx_ofed-redhat7-cuda12-x86_64/ompi/lib -Xlinker=--enable-new-dtags -Xcompiler=-pthread -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/testToolchain.dir/testToolchain.cpp.o -o CMakeFiles/testToolchain.dir/cmake_device_link.o -L/vend/nvidia/cuda/v12.2/targets/x86_64-linux/lib/stubs -L/vend/nvidia/cuda/v12.2/targets/x86_64-linux/lib ../../../lib/libgtest_main.a ../../../lib/libgtest.a -lpthread ../../../lib/libphysicsSolvers.a ../../../lib/libdiscretizationMethods.a ../../../lib/libfieldSpecification.a ../../../lib/liblinearAlgebra.a ../../../lib/libdataRepository.a ../../../lib/libevents.a ../../../lib/libfileIO.a ../../../lib/libfiniteVolume.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/hypre/lib/libHYPRE.a ../../../lib/libconstitutive.a ../../../lib/libmesh.a ../../../lib/libhdf5_interface.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/silo/lib/libsiloh5.a ../../../lib/libfunctions.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/mathpresso/lib/libmathpresso.a ../../../lib/libdenseLinearAlgebra.a ../../../lib/libPVTPackage.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/parmetis/lib/libparmetis.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/metis/lib/libmetis.a ../../../lib/libschema.a ../../../lib/libfiniteElement.a ../../../lib/libcodingUtilities.a ../../../lib/libcommon.a ../../../lib/liblvarray.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/pugixml/lib64/libpugixml.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/chai/lib/libchai.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/chai/lib/libumpire.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/raja/lib/libRAJA.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/raja/lib/libcamp.a /vend/nvidia/cuda/v12.2/lib64/libcudart_static.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/conduit/lib/libconduit_relay.a -lrt -lm /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/conduit/lib/libconduit_blueprint.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/conduit/lib/libconduit.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/fmt/lib64/libfmt.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/adiak/lib/libadiak.a -ldl /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/scotch/lib/libptscotch.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/scotch/lib/libptscotcherr.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/scotch/lib/libscotch.a /data/saet/mtml/software/x86_64/RHEL7/GEOSTPL/0.2.0/install-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo/scotch/lib/libscotcherr.a -lcudadevrt -lcudart_static -lmpi
nvlink error : Size doesn't match for '_ZN4geos13finiteElement18ImplicitKernelBaseINS_20CellElementSubRegionENS_12constitutive11PorousSolidINS3_16ElasticIsotropicEEENS0_25H1_Wedge_Lagrange1_Gauss6ELi3ELi3EE14StackVariablesC1Ev$567' in '../../../lib/libphysicsSolvers.a:PoromechanicsEFEMKernels_CellElementSubRegion_PorousSolid-ElasticIsotropic-_H1_Wedge_Lagrange1_Gauss6.cpp.o', first specified in '../../../lib/libphysicsSolvers.a:SolidMechanicsFixedStressThermoPoroElasticKernels_CellElementSubRegion_PorousSolid-ElasticIsotropic-_H1_Wedge_Lagrange1_Gauss6.cpp.o' (target: sm_80)
nvlink fatal : merge_elf failed (target: sm_80)
make[3]: *** [coreComponents/unitTests/toolchain/CMakeFiles/testToolchain.dir/cmake_device_link.o] Error 1
make[3]: Leaving directory `/dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo'
make[2]: *** [coreComponents/unitTests/toolchain/CMakeFiles/testToolchain.dir/all] Error 2
make[2]: Leaving directory `/dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo'
make[1]: *** [coreComponents/unitTests/toolchain/CMakeFiles/testToolchain.dir/rule] Error 2
make[1]: Leaving directory `/dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.2-ompi_hpcx-OMP-relwithdebinfo'
make: *** [coreComponents/unitTests/toolchain/CMakeFiles/testToolchain.dir/rule] Error 2
CUDA 11.7, 11.8 and 12.2 all have the same issue.
I have already opened an issue on this https://github.com/GEOS-DEV/GEOS/issues/2856 .
Can we get the investigation started? We cannot build GPU GEOS anymore.
Anyone volunteer to do this work?
Anyone volunteer to do this work?
I can take care of upgrading our CI ubuntu builds. Let us decide what exactly we want though.
For CPU builds we currently have:
- Ubuntu (20.04, gcc 9.3.0, open-mpi 4.0.3)
- Ubuntu debug (20.04, gcc 10.3.0, open-mpi 4.0.3) - github codespaces
- Ubuntu (20.04, gcc 10.3.0, open-mpi 4.0.3) - github codespaces
- Ubuntu (22.04, gcc 11.2.0, open-mpi 4.1.2)
- Ubuntu (22.04, gcc 12.3.0, open-mpi 4.1.2)
- Pecan CPU (centos 7.7, gcc 8.2.0, open-mpi 4.0.1, mkl 2019.5)
- Pangea 2 (centos 7.6, gcc 8.3.0, open-mpi 2.1.5, mkl 2019.3)
- Sherlock CPU (centos 7.9.2009, gcc 10.1.0, open-mpi 4.1.2, openblas 0.3.10)
Shall we remove ubuntu 20 and have gcc > 11
or do we want to keep an older version?
For GPU builds:
- Ubuntu CUDA debug (20.04, clang 10.0.0 + gcc 9.4.0, open-mpi 4.0.3, cuda-11.8.89)
- Ubuntu CUDA (20.04, clang 10.0.0 + gcc 9.4.0, open-mpi 4.0.3, cuda-11.8.89)
- Centos (7.7, gcc 8.3.1, open-mpi 1.10.7, cuda 11.8.89)
- Pecan GPU (centos 7.7, gcc 8.2.0, open-mpi 4.0.1, mkl 2019.5, cuda 11.5.119)
Do we want to fully move to cuda12 ? I can see what images I can find but we can probably bump up the OS version and the compiler.
For information, Pangea 2 should be removed in the very next weeks. 🤞
@sframba @jeannepellerin What are our gcc
requirements on P3/P4?
Do we want to fully move to cuda12?
I'd be surprised this is something possible w.r.t. all the cluster constraints. @jeannepellerin @sframba @matteofrigo5 @drmichaeltcvx ?
We would like to add our CVX configurations for GPU builds on the CI environment. We are using A100 Nvidia GPU h/w and we are on RHEL 7.9.
Can we get some intro walk through on your CI environment?
What are our
gcc
requirements on P3/P4?
On P3 we are using gcc8.4.1 (I know, it's old), and on P4 we use gcc12.1
I'd be surprised this is something possible w.r.t. all the cluster constraints.
We will have to check compatibility with IBM drivers on P3. Let me know if you want to pursue cuda 12 on P3 in the short term, I can ask if IBM support would be available to help