reframe icon indicating copy to clipboard operation
reframe copied to clipboard

nvcc compiler won't work on reframe but it works with spack

Open kaanolgu opened this issue 1 year ago • 6 comments

I executed spack install nvhpc and it installed the nvhpc compilers. I then added the directories into .spack/compilers.yaml file :

- compiler:
    spec: nvhpc@=23.9
    paths:
      cc: /lustre/home/br-kolgu/spack/opt/spack/cray-rhel8-broadwell/gcc-13.1.0/nvhpc-23.9-glmhdcpn2c4zouhzuatdrdj7x7igniik/Linux_x86_64/2023/compilers/bin/nvcc
      cxx: /lustre/home/br-kolgu/spack/opt/spack/cray-rhel8-broadwell/gcc-13.1.0/nvhpc-23.9-glmhdcpn2c4zouhzuatdrdj7x7igniik/Linux_x86_64/2023/compilers/bin/nvc++
      f77: /lustre/home/br-kolgu/spack/opt/spack/cray-rhel8-broadwell/gcc-13.1.0/nvhpc-23.9-glmhdcpn2c4zouhzuatdrdj7x7igniik/Linux_x86_64/2023/compilers/bin/nvfortran
      fc: /lustre/home/br-kolgu/spack/opt/spack/cray-rhel8-broadwell/gcc-13.1.0/nvhpc-23.9-glmhdcpn2c4zouhzuatdrdj7x7igniik/Linux_x86_64/2023/compilers/bin/nvfortran
    flags: {}
    operating_system: rhel8
    target: any
    modules: []
    environment: {}
    extra_rpaths: []

When I try to run reframe -c benchmarks/apps/babelstream -r --tag thrust --system=isambard-macs:volta --setvar=num_cpus_per_task=40 -S build_locally=false -Sspack_spec='babelstream%[email protected] +thrust implementation=cuda cuda_arch=70 backend=cuda'

( Babelstream version : https://github.com/spack/spack/pull/41019/ ) It gives me the following error message :

==> Warning: duplicate found for gcc@=12.1.0 on rhel8/any. Edit your compilers.yaml configuration to remove it.
==> Error: ProcessError: Command exited with status 77:
    '/var/tmp/pbs.81951.gw4head/br-kolgu/spack-stage/spack-stage-gmake-4.4.1-qfhzizskwnrobnf4s7eqplfqaam3ppui/spack-src/configure' '--prefix=/lustre/home/br-kolgu/excalibur-tests/benchmarks/spack/isambard-macs/volta/opt/cray-rhel8-cascadelake/nvhpc-23.9/gmake-4.4.1-qfhzizskwnrobnf4s7eqplfqaam3ppui' '--without-guile' '--disable-nls' '--disable-dependency-tracking'

2 errors found in build log:
     6     checking for gawk... gawk
     7     checking whether make sets $(MAKE)... yes
     8     checking whether make supports nested variables... yes
     9     checking whether make supports the include directive... yes (GNU sty
           le)
     10    checking for gcc... /lustre/home/br-kolgu/spack/lib/spack/env/nvhpc/
           nvc
     11    checking whether the C compiler works... no
  >> 12    configure: error: in `/var/tmp/pbs.81951.gw4head/br-kolgu/spack-stag
           e/spack-stage-gmake-4.4.1-qfhzizskwnrobnf4s7eqplfqaam3ppui/spack-src
           /spack-build':
  >> 13    configure: error: C compiler cannot create executables
     14    See `config.log' for more details

See build log for details:
  /var/tmp/pbs.81951.gw4head/br-kolgu/spack-stage/spack-stage-gmake-4.4.1-qfhzizskwnrobnf4s7eqplfqaam3ppui/spack-build-out.txt

==> Warning: Skipping build of babelstream-5.0-fkrqvhfz5jf3di3n26hwl5djcxaky4nm since gmake-4.4.1-qfhzizskwnrobnf4s7eqplfqaam3ppui failed
==> Error: babelstream-5.0-fkrqvhfz5jf3di3n26hwl5djcxaky4nm: Package was not installed
==> Error: Installation request failed.  Refer to reported errors for failing package(s).

But this compiler works on when I try spack install ... command so I believe there must be a step I am missing inside ReFrame to configure the compiler to picked up by ReFrame properly.

kaanolgu avatar Nov 27 '23 10:11 kaanolgu

@kaanolgu if I am not mistaken the cc in the compilers settings should point to nvc not to nvcc. nvcc is part of the cudatoolkit.

teojgo avatar Nov 27 '23 14:11 teojgo

@teojgo It is nvcc inside the folder

kaanolgu avatar Dec 01 '23 16:12 kaanolgu

@kaanolgu Could you share the generated build script from reframe? That's the rfm_build.sh script inside the stage folder.

vkarak avatar Dec 13 '23 21:12 vkarak

@vkarak sorry for delayed reply;

The rfm_build.sh file is this :

#!/bin/bash -l
#PBS -N rfm_THRUSTBench
#PBS -o rfm_build.out
#PBS -e rfm_build.err
#PBS -l select=1:mpiprocs=1:ncpus=16:ngpus=1
#PBS -q voltaq
cd /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/volta/default/THRUSTBenchmark_NVIDIA

_onerror()
{
    exitcode=$?
    echo "-reframe: command \`$BASH_COMMAND' failed (exit code: $exitcode)"
    exit $exitcode
}

trap _onerror ERR

export OMP_NUM_THREADS=40
cp /lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/common.yaml /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/volta/default/THRUSTBenchmark_NVIDIA/common.yaml
cp -r /lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/repo /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/volta/default/THRUSTBenchmark_NVIDIA/repo
mkdir -p /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/volta/default/THRUSTBenchmark_NVIDIA/spack_env
(cd /lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/isambard-macs; find . \( -name "spack.yaml" -o -name "compilers.yaml" -o -name "packages.yaml" \) -print0 | xargs -0 tar cf - | tar -C /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/volta/default/THRUSTBenchmark_NVIDIA/spack_env -xvf -)
spack -e /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/volta/default/THRUSTBenchmark_NVIDIA/spack_env/volta config add "config:install_tree:root:/lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/isambard-macs/volta/opt"
spack -e /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/volta/default/THRUSTBenchmark_NVIDIA/spack_env/volta add babelstream%[email protected] +thrust thrust_backend=cuda cuda_arch=70 backend=cuda flags=-allow-unsupported-compiler
spack -e /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/volta/default/THRUSTBenchmark_NVIDIA/spack_env/volta install

And the new error message is this rfm_build.err :

==> Error: ProcessError: Command exited with status 1:
    '/lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/isambard-macs/volta/opt/cray-rhel8-cascadelake/gcc-13.1.0/cmake-3.27.7-vscc6vyb4iqwb3lzzwt64rsla7cv3gog/bin/cmake' '-G' 'Unix Makefiles' '-DCMAKE_INSTALL_PREFIX:STRING=/lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/isambard-macs/volta/opt/cray-rhel8-cascadelake/nvhpc-23.9/babelstream-5.0-enzenbzkm6jy4hiy3oixso3ybwjv3jni' '-DCMAKE_BUILD_TYPE:STRING=Release' '-DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF' '-DCMAKE_VERBOSE_MAKEFILE:BOOL=ON' '-DCMAKE_INSTALL_RPATH_USE_LINK_PATH:BOOL=ON' '-DCMAKE_INSTALL_RPATH:STRING=/lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/isambard-macs/volta/opt/cray-rhel8-cascadelake/nvhpc-23.9/babelstream-5.0-enzenbzkm6jy4hiy3oixso3ybwjv3jni/lib;/lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/isambard-macs/volta/opt/cray-rhel8-cascadelake/nvhpc-23.9/babelstream-5.0-enzenbzkm6jy4hiy3oixso3ybwjv3jni/lib64;/cm/shared/apps/cuda11.2/toolkit/11.2.0/lib64' '-DCMAKE_PREFIX_PATH:STRING=/lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/isambard-macs/volta/opt/cray-rhel8-cascadelake/nvhpc-23.9/thrust-1.16.0-4vzbtqauvqmgrogstre4xb4noiiwi5sg;/lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/isambard-macs/volta/opt/cray-rhel8-cascadelake/gcc-13.1.0/cmake-3.27.7-vscc6vyb4iqwb3lzzwt64rsla7cv3gog;/cm/shared/apps/cuda11.2/toolkit/11.2.0;/cm/shared/apps/cuda11.2/toolkit/11.2.0/targets/x86_64-linux/lib/cmake' '-DMODEL=thrust' '-DTHRUST_IMPL=CUDA' '-SDK_DIR=/lustre/home/br-kolgu/excalibur-tests-upstream/benchmarks/spack/isambard-macs/volta/opt/cray-rhel8-cascadelake/nvhpc-23.9/thrust-1.16.0-4vzbtqauvqmgrogstre4xb4noiiwi5sg/include' '-DCUDA_ARCH=70' '-DCMAKE_CUDA_COMPILER=/cm/shared/apps/cuda11.2/toolkit/11.2.0/bin/nvcc' '-DCMAKE_CUDA_FLAGS=-ccbin /lustre/home/br-kolgu/spack/lib/spack/env/nvhpc/nvc' '-DBACKEND=CUDA' '-DCUDA_EXTRA_FLAGS=-allow-unsupported-compiler' '/var/tmp/pbs.83122.gw4head/br-kolgu/spack-stage/spack-stage-babelstream-5.0-enzenbzkm6jy4hiy3oixso3ybwjv3jni/spack-src'

1 error found in build log:
     55       BACKEND = `CUDA`
     56       MANAGED = `OFF`
     57       CMAKE_CUDA_COMPILER = `/cm/shared/apps/cuda11.2/toolkit/11.2.0/bi
           n/nvcc`
     58       CUDA_ARCH = `70`
     59       CUDA_EXTRA_FLAGS = `-allow-unsupported-compiler`
     60    
  >> 61    CMake Error at /lustre/home/br-kolgu/excalibur-tests-upstream/benchm
           arks/spack/isambard-macs/volta/opt/cray-rhel8-cascadelake/gcc-13.1.0
           /cmake-3.27.7-vscc6vyb4iqwb3lzzwt64rsla7cv3gog/share/cmake-3.27/Modu
           les/CMakeDetermineCompilerId.cmake:753 (message):
     62      Compiling the CUDA compiler identification source file
     63      "CMakeCUDACompilerId.cu" failed.
     64    
     65      Compiler: /cm/shared/apps/cuda11.2/toolkit/11.2.0/bin/nvcc
     66    
     67      Build flags:

See build log for details:
  /var/tmp/pbs.83122.gw4head/br-kolgu/spack-stage/spack-stage-babelstream-5.0-enzenbzkm6jy4hiy3oixso3ybwjv3jni/spack-build-out.txt

The run command I use is this ;

reframe -c benchmarks/apps/babelstream -r --tag thrust --system=isambard-macs:volta --setvar=num_cpus_per_task=40 -S build_locally=false -Sspack_spec='babelstream%[email protected] +thrust thrust_backend=cuda cuda_arch=70 backend=cuda flags=-allow-unsupported-compiler'

And since we were working on the newer version of the spack package for babelstream, the new version is this : https://github.com/spack/spack/pull/41019

It hasn't merged in yet but other models are working with gcc but anything that uses oneapi or nvhpc compiler does not compile if that helps.

I could provide more information if needed

kaanolgu avatar Feb 02 '24 15:02 kaanolgu

@giordano Do you maybe have a hint about this? So far, I'm not sure if ReFrame is at fault here.

vkarak avatar Feb 15 '24 21:02 vkarak

@vkarak @giordano actually, I was able to reproduce the issue with spack environment too.

# This is a Spack Environment file.
#
# It describes a set of packages to be installed, along with
# configuration settings.
spack:
  # add package specs to the `specs` list
  specs:
  - cuda
  # - babelstream%[email protected]+cuda cuda_arch=70 # works
  - babelstream%[email protected]+cuda cuda_arch=70 mem=managed #works
  view: true
  include:
  - ./compilers.yaml
  packages:
    gmake:
      externals:
      - spec: [email protected]
        prefix: /usr
  concretizer:
    unify: true

When I use this spack.yaml file it builds without any issues but when I comment out the -cuda line it gives me the same error message :

1 error found in build log:
     54      specific short-term circumstances.  Projects should be ported to the NEW
     55      behavior and not rely on setting a policy to OLD.
     56    Call Stack (most recent call first):
     57      CMakeLists.txt:196 (setup)
     58    
     59    
  >> 60    CMake Error at /lustre/home/br-kolgu/spack/opt/spack/cray-rhel8-broadwell/gcc-13.1.0/cmake-3.27.7-utysvikmqbmtirlmusucjwj4w536xjt2/share/cmake-3.27/Modules/CMakeDete
           rmineCompilerId.cmake:753 (message):
     61      Compiling the CUDA compiler identification source file
     62      "CMakeCUDACompilerId.cu" failed.
     63    
     64      Compiler:
     65      /lustre/home/br-kolgu/spack/opt/spack/cray-rhel8-broadwell/nvhpc-23.9/cuda-10.0.130-kytmfgpgrgojj5fu3m26ozwp2gpo7avh/bin/nvcc
     66    

I could also share the concretize messages too in case it is needed

kaanolgu avatar Feb 15 '24 23:02 kaanolgu

I will close this as it's not clear that it is a ReFrame issue. Feel free to reopen it if you have more evidence that ReFrame is at fault here.

vkarak avatar May 14 '24 20:05 vkarak