kokkos-kernels icon indicating copy to clipboard operation
kokkos-kernels copied to clipboard

compile failure due to missing file KokkosLapack_tpl_spec.hpp

Open glhenni opened this issue 1 year ago • 14 comments

I'm getting the following error when build our application:

In file included from /scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/lapack/impl/KokkosLapack_gesv_spec.hpp:130,
                 from /scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels/lapack/eti/generated_specializations_cpp/gesv/Lapack_gesv_eti_DOUBLE_LAYOUTLEFT_EXECSPACE_CUDA_MEMSPACE_CUDASPACE.cpp:20:
/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/lapack/tpls/KokkosLapack_gesv_tpl_spec_decl.hpp:278:10: fatal error: KokkosLapack_tpl_spec.hpp: No such file or directory

This is kokkos and kokkos-kernels as contained within Trilinos/develop as of this morning.

The entire compile line is pretty messy, but here it is:

/projects/cde/v3/cee/spack/opt/spack/linux-rhel7-x86_64/gcc-10.3.0/openmpi-4.1.2-ated23f2cr5tikdwru4hsr7pl25jk2bp/bin/mpicxx -DKOKKOS_DEPENDENCE -I/projects/gemma_user/magma-2.6.2/cuda-11.2.2/include -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels/blas -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/blas -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels/lapack -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/lapack -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels/graph -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/graph -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/ode -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/common/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/common/impl -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/common/unit_test -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/batched -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels/batched/eti -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/batched/dense/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/batched/dense/impl -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/batched/dense/unit_test -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/batched/sparse/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/batched/sparse/impl -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/batched/sparse/unit_test -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/blas/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/blas/impl -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/blas/eti -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels/blas/eti -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/blas/tpls -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/lapack/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/lapack/impl -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/lapack/eti -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels/lapack/eti -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/lapack/tpls -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/graph/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/graph/impl -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/graph/eti -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels/graph/eti -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/sparse/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/sparse/impl -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/sparse/eti -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels/sparse/eti -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/sparse/tpls -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/ode/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/ode/impl -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos-kernels/ode/unit_test -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos/core/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos/core/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos/core/src/../../tpls/desul/include -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos/containers/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos/containers/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos/algorithms/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos/algorithms/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos/simd/src -I/scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/Trilinos/packages/kokkos/simd/src -pedantic -Wall -Wno-long-long -Wwrite-strings    -extended-lambda -Wext-lambda-captures-this -arch=sm_70  -DADD_ -fopenmp -lgfortran -O3 -DNDEBUG -w -extended-lambda -Wext-lambda-captures-this -arch=sm_70 -std=c++17 -MD -MT packages/kokkos-kernels/CMakeFiles/kokkoskernels.dir/lapack/eti/generated_specializations_cpp/gesv/Lapack_gesv_eti_DOUBLE_LAYOUTLEFT_EXECSPACE_CUDA_MEMSPACE_CUDASPACE.cpp.o -MF CMakeFiles/kokkoskernels.dir/lapack/eti/generated_specializations_cpp/gesv/Lapack_gesv_eti_DOUBLE_LAYOUTLEFT_EXECSPACE_CUDA_MEMSPACE_CUDASPACE.cpp.o.d -o CMakeFiles/kokkoskernels.dir/lapack/eti/generated_specializations_cpp/gesv/Lapack_gesv_eti_DOUBLE_LAYOUTLEFT_EXECSPACE_CUDA_MEMSPACE_CUDASPACE.cpp.o -c /scratch/gemmaops/jenkins/workspace/trilinos-ascicgpu-cde-cuda-openmp-static-release/build/ascicgpu/trilinos_cde_v3-gnu-OpenMPI-cuda_11_2-OpenMP-static-Release/packages/kokkos-kernels/lapack/eti/generated_specializations_cpp/gesv/Lapack_gesv_eti_DOUBLE_LAYOUTLEFT_EXECSPACE_CUDA_MEMSPACE_CUDASPACE.cpp

glhenni avatar Nov 21 '23 18:11 glhenni

Pulling in @vqd8a who might know more, or be more helpful in answering any questions...

glhenni avatar Nov 21 '23 18:11 glhenni

@glhenni can you share your CMake configuration line for Trilinos?

ndellingwood avatar Nov 21 '23 19:11 ndellingwood

@ndellingwood In Kokkos Kernels 4.1, when gesv was in blas, KokkosBlas_tpl_spec.hpp was included in KokkosLapack_gesv_tpl_spec_decl.hpp. But now in Kokkos Kernels 4.2, since gesv was moved to lapack directory, we should have a similar file as KokkosLapack_tpl_spec.hpp. @lucbv

vqd8a avatar Nov 21 '23 19:11 vqd8a

@vqd8a yeah, KokkosLapack_tpl_spec.hpp is not present. Looks like enabling CUSOLVER or MAGMA should reproduce

ndellingwood avatar Nov 21 '23 19:11 ndellingwood

It's tricky because we use a settings file, or files, for a lot of our settings rather than specifying all of them via -DCMAKE_VAR=VALUE in the cmake invocation itself. If I had to guess the one that's biting us is -DKokkosKernels_ENABLE_TPL_MAGMA=ON. We are using MAGMA and cmake is finding it fine.

If you REALLY need me to generate a standalone cmake command line for our build let me know. I'll translate our settings.cmake file into the equivalent cmake -D invocation.

glhenni avatar Nov 21 '23 19:11 glhenni

If you REALLY need me to generate a standalone cmake command line for our build let me know. I'll translate our settings.cmake file into the equivalent cmake -D invocation.

No, that shouldn't be necessary, I think knowing Magma was enabled is a good clue. Thanks!

ndellingwood avatar Nov 21 '23 19:11 ndellingwood

Looks like @lucbv has wip for the CuSolver case in #2038 Edit: fixed the PR number

ndellingwood avatar Nov 21 '23 20:11 ndellingwood

@ndellingwood @lucbv There is a new KokkosLapack_cusolver.hpp in that PR. It looks to me that we need KokkosLapack_magma.hpp too.

vqd8a avatar Nov 21 '23 20:11 vqd8a

Looks like the Magma-related stuff from blas/tpls/KokkosBlas_Cuda_tpl.hpp was copied to lapack/tpls/KokkosLapack_Cuda_tpl We're missing the Magma-related stuff (possibly other stuff) from blas/tpls/KokkosBlas_tpl_spec.hpp (no lapack counterpart)

ndellingwood avatar Nov 21 '23 20:11 ndellingwood

@ndellingwood the work I have in the cuSOLVER PR will eventually fix the problem but I do not think that it would be appropriate to cherry pick for a patch release though. I see that your commits above are more strategic about getting only the small subset needed though. Let me know if you need help with it? Ultimately we need to get MAGMA on Weaver or Caraway to reproduce this...

lucbv avatar Nov 21 '23 22:11 lucbv

Let me know if you need help with it?

@lucbv thanks, the updates seemed straightforward but I'm still hitting compilation errors, may follow up with you for some help if I get stuck

we need to get MAGMA on Weaver or Caraway to reproduce this

yeah, I built my own copy on Weaver for now, I'll put in a request for the TPL and share the config (they'll use spack, but config should help with the recipe)

ndellingwood avatar Nov 21 '23 22:11 ndellingwood

the work I have in the cuSOLVER PR will eventually fix the problem but I do not think that it would be appropriate to cherry pick for a patch release though.

@lucbv yeah, and since it follows other updates (e.g. rocsolver) we'd probably have to pull in extra stuff for a clean patch. For now for an eventual Trilinos patch, in addition to the magma fixes, should we remove the cusolver stub from the lapack updates (in particular, KokkosLapack_Cuda_tpl.hpp, which includes the problematic KokkosLapack_tpl_spec.hpp)?

ndellingwood avatar Nov 21 '23 22:11 ndellingwood

Yeah if that can be removed without breaking anything that would be good actually. As long as MAGMA still works that should be good enough.

lucbv avatar Nov 21 '23 23:11 lucbv

A fix for 4.2.00 (against release-candidate-4.2.01) issued with #2044, and Trilinos PR https://github.com/trilinos/Trilinos/pull/12555 , @glhenni hopefully these resolve the issue in Trilinos (the PR is set for AUTOMERGE)

ndellingwood avatar Nov 22 '23 21:11 ndellingwood