kokkos-kernels icon indicating copy to clipboard operation
kokkos-kernels copied to clipboard

BlockSPGEMM build error

Open lucbv opened this issue 3 years ago • 11 comments

There is a first issue related to variable shadowing in a Cuda10 build on kokkos-dev2:

In file included from /home/jenkins/jenkins-new/workspace/KokkosKernels_KokkosDev2_CLANG8_CUDA10_cpp14/kokkos-kernels/unit_test/serial/Test_Serial_Sparse.cpp:5:
In file included from /home/jenkins/jenkins-new/workspace/KokkosKernels_KokkosDev2_CLANG8_CUDA10_cpp14/kokkos-kernels/unit_test/sparse/Test_Sparse.hpp:15:
/home/jenkins/jenkins-new/workspace/KokkosKernels_KokkosDev2_CLANG8_CUDA10_cpp14/kokkos-kernels/unit_test/sparse/Test_Sparse_bspgemm.hpp:176:25: error: declaration shadows a variable in the global namespace [-Werror,-Wshadow]
void test_bspgemm(lno_t blockDim, lno_t m, lno_t k, lno_t n, size_type nnz,
                        ^
/home/projects/x86-64/clang/8.0/lib/clang/8.0.0/include/__clang_cuda_builtin_vars.h:114:46: note: previous declaration is here
__CUDA_BUILTIN_VAR __cuda_builtin_blockDim_t blockDim;
                                             ^

A second issue is related to the use of auto in an Intel 17 dbg build:

<https://jenkins-son.sandia.gov/job/KokkosKernels_OSRHEL7_Intel17_0_1-dbg/ws/kokkos-kernels/src/common/KokkosKernels_BlockUtils.hpp(133)>: error #3373: nonstandard use of "auto" to both deduce the type from an initializer and to announce a trailing return type
        for (auto va = valA + row_offset, end = va + block_dim; va < end; ++va) {
             ^

lucbv avatar May 02 '22 13:05 lucbv

Adding @MikolajZuzek, I will reproduce this on our machines and unless I find more problems I will submit a simple fix for these two small issues.

lucbv avatar May 02 '22 13:05 lucbv

Also adding @ndellingwood to avoid filling this twice : )

lucbv avatar May 02 '22 13:05 lucbv

@lucbv also seeing some runtime test failures after merge of PR #1099 as well (there were no changes merged to kokkos the day this test began failing), for example in cuda/10.0 build with rdc and uvm enabled:

08:38:11 4: [ RUN      ] cuda.sparse_block_spgemm_kokkos_complex_double_int_int_TestExecSpace
08:38:11 4: /home/jenkins/weaver-new/workspace/KokkosKernels_Weaver_Cuda_cuda_100_gcc_740_rdc-uvm/kokkos-kernels/unit_test/sparse/Test_Sparse_bspgemm.hpp:274: Failure
08:38:11 4: Value of: is_expected_to_fail
08:38:11 4:   Actual: false
08:38:11 4: Expected: true
08:38:11 4: SPGEMM_KK: Kokkos::Impl::ParallelFor< Cuda > requested too large team size.
08:38:11 4: /home/jenkins/weaver-new/workspace/KokkosKernels_Weaver_Cuda_cuda_100_gcc_740_rdc-uvm/kokkos-kernels/unit_test/sparse/Test_Sparse_bspgemm.hpp:277: Failure
08:38:11 4: Value of: failed
08:38:11 4:   Actual: true
08:38:11 4: Expected: is_expected_to_fail
08:38:11 4: Which is: false
08:38:11 4: entries are different.
08:38:11 4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... ... ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
08:38:11 4: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ... ... ... 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 
08:38:11 4: /home/jenkins/weaver-new/workspace/KokkosKernels_Weaver_Cuda_cuda_100_gcc_740_rdc-uvm/kokkos-kernels/unit_test/sparse/Test_Sparse_bspgemm.hpp:285: Failure
08:38:11 4: Value of: is_identical
08:38:11 4:   Actual: false
08:38:11 4: Expected: true
08:38:11 4: SPGEMM_KK
08:38:11 4: [  FAILED  ] cuda.sparse_block_spgemm_kokkos_complex_double_int_int_TestExecSpace (8487 ms)
08:38:11 4: [ RUN      ] cuda.sparse_block_spgemm_kokkos_complex_double_int_size_t_TestExecSpace
08:38:11 4: /home/jenkins/weaver-new/workspace/KokkosKernels_Weaver_Cuda_cuda_100_gcc_740_rdc-uvm/kokkos-kernels/unit_test/sparse/Test_Sparse_bspgemm.hpp:274: Failure
08:38:11 4: Value of: is_expected_to_fail
08:38:11 4:   Actual: false
08:38:11 4: Expected: true
08:38:11 4: SPGEMM_KK: Kokkos::Impl::ParallelFor< Cuda > requested too large team size.
08:38:11 4: /home/jenkins/weaver-new/workspace/KokkosKernels_Weaver_Cuda_cuda_100_gcc_740_rdc-uvm/kokkos-kernels/unit_test/sparse/Test_Sparse_bspgemm.hpp:277: Failure
08:38:11 4: Value of: failed
08:38:11 4:   Actual: true
08:38:11 4: Expected: is_expected_to_fail
08:38:11 4: Which is: false
08:38:11 4: entries are different.
08:38:11 4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... ... ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
08:38:11 4: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ... ... ... 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 
08:38:11 4: /home/jenkins/weaver-new/workspace/KokkosKernels_Weaver_Cuda_cuda_100_gcc_740_rdc-uvm/kokkos-kernels/unit_test/sparse/Test_Sparse_bspgemm.hpp:285: Failure
08:38:11 4: Value of: is_identical
08:38:11 4:   Actual: false
08:38:11 4: Expected: true
08:38:11 4: SPGEMM_KK
08:38:11 4: [  FAILED  ] cuda.sparse_block_spgemm_kokkos_complex_double_int_size_t_TestExecSpace (8494 ms)

Reproducer (weaver):

module load cmake/3.19.3 cuda/10.0.130 ibm/xl/16.1.1 gcc/7.4.0

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-devices=Cuda,Serial --arch=Power9,Volta70 --compiler=$KOKKOS_PATH/bin/nvcc_wrapper --cxxflags="-O3 -Wall -Wunused-parameter -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized " --cxxstandard="14" --kokkos-path=$KOKKOS_PATH --kokkoskernels-path=$KOKKOSKERNELS_PATH --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --with-layouts=LayoutLeft --with-cuda-options=enable_lambda,uvm,rdc   --no-examples

Edit: Also occurs with cuda/9.2.88 on the same system

ndellingwood avatar May 02 '22 17:05 ndellingwood

Thanks for letting me know, I will look at that after the build issues are resolved.

lucbv avatar May 02 '22 17:05 lucbv

@lucbv also seeing a separate compilation error with cuda/10.1, Cuda_OpenMP build on kokkos-dev-2:

/home/ndellin/kokkos-kernels/testing/Cuda101/kokkos-install/include/Cuda/Kokkos_Cuda_Parallel_Range.hpp(111): error: calling a __host__ function(" const") from a __device__ function("Kokkos::Impl::ParallelFor< ::,  ::Kokkos::RangePolicy< ::Kokkos::Cuda > ,  ::Kokkos::Cuda> ::operator () const") is not allowed

/home/ndellin/kokkos-kernels/testing/Cuda101/kokkos-install/include/Cuda/Kokkos_Cuda_Parallel_Range.hpp(111): error: identifier " const" is undefined in device code

/home/ndellin/kokkos-kernels/testing/Cuda101/kokkos-install/include/Cuda/Kokkos_Cuda_Parallel_Range.hpp(111): error: calling a __host__ function(" const") from a __device__ function("Kokkos::Impl::ParallelFor< ::,  ::Kokkos::RangePolicy< ::Kokkos::Cuda > ,  ::Kokkos::Cuda> ::operator () const") is not allowed

/home/ndellin/kokkos-kernels/testing/Cuda101/kokkos-install/include/Cuda/Kokkos_Cuda_Parallel_Range.hpp(111): error: identifier " const" is undefined in device code

/home/ndellin/kokkos-kernels/testing/Cuda101/kokkos-install/include/Cuda/Kokkos_Cuda_Parallel_Range.hpp(111): error: calling a __host__ function(" const") from a __device__ function("Kokkos::Impl::ParallelFor< ::,  ::Kokkos::RangePolicy< ::Kokkos::Cuda > ,  ::Kokkos::Cuda> ::operator () const") is not allowed

/home/ndellin/kokkos-kernels/testing/Cuda101/kokkos-install/include/Cuda/Kokkos_Cuda_Parallel_Range.hpp(111): error: identifier " const" is undefined in device code

/home/ndellin/kokkos-kernels/testing/Cuda101/kokkos-install/include/Cuda/Kokkos_Cuda_Parallel_Range.hpp(111): error: calling a __host__ function(" const") from a __device__ function("Kokkos::Impl::ParallelFor< ::,  ::Kokkos::RangePolicy< ::Kokkos::Cuda > ,  ::Kokkos::Cuda> ::operator () const") is not allowed

/home/ndellin/kokkos-kernels/testing/Cuda101/kokkos-install/include/Cuda/Kokkos_Cuda_Parallel_Range.hpp(111): error: identifier " const" is undefined in device code

8 errors detected in the compilation of "/tmp/tmpxft_0003dc49_00000000-6_Test_Cuda_Sparse.cpp1.ii".
make[2]: *** [unit_test/CMakeFiles/KokkosKernels_sparse_cuda.dir/cuda/Test_Cuda_Sparse.cpp.o] Error 1

Failing with PR #1099 Passing with SHA of previous merge commit: ca33f614218b5cfc46796fda82ead414bdc4daf0

Reproducer (kokkos-dev-2):

source /projects/sems/modulefiles/utils/sems-archive-modules-init.sh ; module use /home/projects/x86-64/modulefiles/local
module purge
module load sems-archive-env sems-archive-cmake/3.17.1 sems-archive-gcc/7.3.0 sems-archive-cuda/10.1

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-devices=Cuda,OpenMP --arch=SNB,Volta70 --compiler=$KOKKOS_PATH/bin/nvcc_wrapper --cxxflags="-O3 -Wall -Wunused-parameter -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized " --cxxstandard="14" --kokkos-path=$KOKKOS_PATH --kokkoskernels-path=$KOKKOSKERNELS_PATH --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --with-layouts=LayoutLeft --no-examples

ndellingwood avatar May 02 '22 18:05 ndellingwood

Hum, someone forgot to add a KOKKOS_FUNCTION or KOKKOS_INLINE_FUNCTION somewhere...

lucbv avatar May 02 '22 19:05 lucbv

@lucbv thanks for the PR #1396, this cleared up some of the nightlies but there are still failures with cuda/10.0 and cuda/10.1 with rdc+uvm enabled on Power9+Volta (i.e. Weaver), https://github.com/kokkos/kokkos-kernels/issues/1395#issuecomment-1115151520

ndellingwood avatar May 04 '22 16:05 ndellingwood

@ndellingwood yeah, I was trying to get us thru the build issues, I am not 100% sure why rdc+uvm creates run time failures and will need a bit more time to investigate that. Hopefully I can look into it tomorrow or Friday, sorry for taking time with this but at least it's being worked on : )

lucbv avatar May 04 '22 16:05 lucbv

Thanks @lucbv !

ndellingwood avatar May 04 '22 17:05 ndellingwood

I opened a separate issue #1413 referencing the rdc+uvm failures https://github.com/kokkos/kokkos-kernels/issues/1395#issuecomment-1115151520 for cleaner separation from the original post of this issue (build errors)

ndellingwood avatar May 18 '22 17:05 ndellingwood

This should now be fixed with PR #1470 being merged, also this issue seems to be a duplicate of #1413

lucbv avatar Jul 19 '22 13:07 lucbv