Christian Trott

Results 519 comments of Christian Trott

@ndellingwood which machine was this failing on and is it still an issue?

Does this problem go away with the latest cmake? This looks like something cmake should fix, but we can try and add a workaround.

Note turning LIBDL off the way you did will disable the ability to load profiling tools during runtime.

Generally the profiling tools are pretty relevant for downstream users, because it allows proper kernel names like KokkosKernels::gemv etc. to show up inside of the major third party profiling tools...

What about this? ```diff diff --git a/cmake/kokkos_tpls.cmake b/cmake/kokkos_tpls.cmake index 54c6b520b..4dfdb6457 100644 --- a/cmake/kokkos_tpls.cmake +++ b/cmake/kokkos_tpls.cmake @@ -77,7 +77,7 @@ ENDIF() KOKKOS_IMPORT_TPL(HWLOC) KOKKOS_IMPORT_TPL(LIBNUMA) KOKKOS_IMPORT_TPL(LIBRT) -KOKKOS_IMPORT_TPL(LIBDL) +#KOKKOS_IMPORT_TPL(LIBDL) KOKKOS_IMPORT_TPL(MEMKIND) IF (NOT WIN32) KOKKOS_IMPORT_TPL(THREADS...

Hm I wasn't able to reproduce it. How did you run it (also I disabled the MKL run)

Even with MKL enabled I was not able to reproduce it. Develop: ``` GPU activities: 58.91% 10.3518s 8 1.29398s 1.28917s 1.30694s _ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForIZ12batched_gelsILb0EEvP13cublasContextiiEUliiE_NS_13MDRangePolicyIJNS_4CudaENS_4RankILj2ELNS_7IterateE0ELSA_0EEEEEES8_EEEEvT_ 40.34% 7.08940s 1275 5.5603ms 1.5869ms 169.45ms void geqrSolve_batch_kernel(int,...

However I do see an overall time difference with MKL in the picture: 15min vs 11min which is not explained by anything I see in the nvprof output.

I don't know its a long standing expected behavior after all

Also any idea if it happens with clang 14 and earlier?