kokkos-kernels icon indicating copy to clipboard operation
kokkos-kernels copied to clipboard

Nightly Trilinos Cuda build errors - various perf_test/sparse/KokkosSparse_spiluk.cpp: error: identifier ... is undefined

Open ndellingwood opened this issue 2 years ago • 7 comments

Nightly Cuda builds of Trilinos are failing to compile the KokkosSparse_spiluk.cpp perf test with cuda/9.2.88 and cuda/10.1.105:

Error snip:

17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(273): error: identifier "status" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(274): error: identifier "handle" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(274): error: identifier "descr" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(275): error: identifier "info" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(275): error: identifier "policy" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(275): error: identifier "pBuffer" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(274): error: argument of type "std::size_t *" is incompatible with parameter of type "const int *"
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(282): error: identifier "structural_zero" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(290): error: argument of type "std::size_t *" is incompatible with parameter of type "const int *"
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(298): error: identifier "numerical_zero" is undefined

@jgfouca could any of your changes #1356 possibly impact this perf test?

Reproducer (kokkos-dev):

git clone -b kokkos-promotion https://github.com/trilinos/Trilinos.git
# Symbolic link to your kokkos and kokkos-kernels repos in Trilinos source directory for source override
cd Trilinos
ln -s <path-to-your-repo>/kokkos kokkos
ln -s <path-to-your-repo>/kokkos-kernels kokkos-kernels

cd $HOME
mkdir -p build
cd build

# Environment and configure
export ATDM_CONFIG_REGISTER_CUSTOM_CONFIG_DIR=${TRILINOS_DIR}/cmake/std/atdm/contributed/kokkos-dev
source ${TRILINOS_DIR}/cmake/std/atdm/load-env.sh kokkos-dev-cuda-opt
export OMPI_CXX=$KOKKOS_DIR/bin/nvcc_wrapper

cmake \
 -GNinja \
 -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
 -DCMAKE_INSTALL_PREFIX="${PWD}/install" \
 -DCMAKE_CXX_STANDARD="14" \
 -DTrilinos_ENABLE_TESTS=OFF \
 -DTrilinos_ENABLE_ALL_PACKAGES=OFF \
 -DTrilinos_ENABLE_Kokkos=ON \
  -DKokkos_ARCH_KEPLER35=ON \
 -DTrilinos_ENABLE_KokkosKernels=ON \
  -DKokkosKernels_ENABLE_TESTS=ON \
 -DKokkos_ENABLE_CUDA=ON \
 -DKokkos_SOURCE_DIR_OVERRIDE:STRING=kokkos \
 -DKokkosKernels_SOURCE_DIR_OVERRIDE:STRING=kokkos-kernels \
$TRILINOS_DIR

ndellingwood avatar Mar 22 '22 02:03 ndellingwood

@ndellingwood , I don't think so since I did not touch the spiluk test in #1356 , but I could try to confirm. Is this on weaver?

jgfouca avatar Mar 22 '22 18:03 jgfouca

@jgfouca it can be reproduced with a Cuda build on any system, the reproducer above is for kokkos-dev

ndellingwood avatar Mar 22 '22 18:03 ndellingwood

It doesn't look like the source code changed in the cpp file, but some configuration option must have changed where this guard is no longer true:

#if defined(KOKKOSKERNELS_INST_ORDINAL_INT) && \
    defined(KOKKOSKERNELS_INST_OFFSET_INT)

where the cusparse types are defined, but then later the cusparse tested components are not wrapped in that same guard resulting to calls in cusparse routines without the defined types. I can put in a PR that adds the guards above to each cusparse region, hopefully that is the right thing to do here

ndellingwood avatar Mar 23 '22 01:03 ndellingwood

@ndellingwood , I don't know if you fixed something, but I was not able to reproduce this build error on kokkos-dev using the steps you provided:

[ 97%] Built target KokkosKernels_KokkosBlas3_perf_test
Scanning dependencies of target KokkosKernels_blas_cuda
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_blas_cuda.dir/Test_Main.cpp.o
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_blas_cuda.dir/cuda/Test_Cuda_Blas.cpp.o
[ 97%] Linking CXX executable KokkosKernels_common_cuda.exe
[ 97%] Built target KokkosKernels_common_cuda
Scanning dependencies of target KokkosKernels_blas_serial
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_blas_serial.dir/Test_Main.cpp.o
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_blas_serial.dir/serial/Test_Serial_Blas.cpp.o
[ 97%] Linking CXX executable KokkosKernels_graph_serial.exe
[ 97%] Built target KokkosKernels_graph_serial
Scanning dependencies of target KokkosKernels_batched_sla_serial
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_batched_sla_serial.dir/Test_Main.cpp.o
[ 97%] Linking CXX executable KokkosKernels_sparse_serial.exe
[ 97%] Built target KokkosKernels_sparse_serial
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_batched_sla_serial.dir/serial/Test_Serial_Batched_Sparse.cpp.o
[ 98%] Linking CXX executable KokkosKernels_batched_sla_serial.exe
[ 98%] Built target KokkosKernels_batched_sla_serial
[ 99%] Linking CXX executable KokkosKernels_blas_cuda.exe
[ 99%] Built target KokkosKernels_blas_cuda
[ 99%] Linking CXX executable KokkosKernels_batched_dla_serial.exe
[ 99%] Built target KokkosKernels_batched_dla_serial
[100%] Linking CXX executable KokkosKernels_blas_serial.exe
[100%] Built target KokkosKernels_blas_serial
[100%] Linking CXX executable KokkosKernels_graph_cuda.exe
[100%] Linking CXX executable KokkosKernels_sparse_cuda.exe
[100%] Built target KokkosKernels_graph_cuda
[100%] Built target KokkosKernels_sparse_cuda
[100%] Linking CXX executable KokkosKernels_batched_dla_cuda.exe
[100%] Built target KokkosKernels_batched_dla_cuda

jgfouca avatar Mar 23 '22 17:03 jgfouca

@jgfouca I put in a PR with a fix but thanks for checking. In your build did you add the symbolic links in Trilinos to updated kokkos and kokkos-kernels repos for source override? If kokkos-kernels was a bit out of date that may explain the failure to reproduce

ndellingwood avatar Mar 24 '22 01:03 ndellingwood

@ndellingwood , you're right. I forgot to set my KK to develop after I cloned.

jgfouca avatar Mar 24 '22 19:03 jgfouca

@ndellingwood , for what it's worth, this build err was not introduced by #1356 . I set my KK repo to:

commit 6bb39275f4089e65fbaa8c8deae1ebe00454f755
Merge: ec6cf57 e634bd5
Author: Luc Berger <[email protected]>
Date:   Fri Mar 18 10:16:32 2022 -0600

    Merge pull request #1356 from jgfouca/jgfouca/minor_test_cleanup
    
    A couple newer sparse tests were not following the new testing pattern

... and the build worked fine. This is a relief to me since my PR was purely code cleanup and should not have changed semantics. If you want, I can bisect the exact KK PR that caused the problem or we can just move on.

jgfouca avatar Mar 24 '22 20:03 jgfouca