Daniel Arndt
Daniel Arndt
> @masterleinad are there builds of Trilinos with the hpx backend? Or do you mean internal kokkos tests failing with hpx? No, there are no test failures for Kokkos+HPX but...
@ndellingwood Would you mind checking if ```diff diff --git a/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp b/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp index 4cf1a13a740..e09d0e76ab0 100644 --- a/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp +++ b/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp @@ -5846,10 +5846,8 @@ namespace Tpetra { } execute_sync_host_uvm_access(); // protect host...
The relevant backtrace is ``` #0 futex_wait (private=0, expected=2, futex_word=0x55555bb44210) at ../sysdeps/nptl/futex-internal.h:146 #1 __GI___lll_lock_wait (futex=futex@entry=0x55555bb44210, private=0) at ./nptl/lowlevellock.c:49 #2 0x00007ffff53d9002 in lll_mutex_lock_optimized (mutex=0x55555bb44210) at ./nptl/pthread_mutex_lock.c:48 #3 ___pthread_mutex_lock (mutex=0x55555bb44210) at ./nptl/pthread_mutex_lock.c:93...
The culprit here is https://github.com/trilinos/Trilinos/blob/77005adad6d625dbf62009620ffdc4ffa06b9fac/packages/tpetra/core/src/Tpetra_BlockCrsMatrix_def.hpp#L2533 which allocates and deallocates memory by creating a `View` in https://github.com/trilinos/Trilinos/blob/77005adad6d625dbf62009620ffdc4ffa06b9fac/packages/tpetra/core/src/Tpetra_BlockCrsMatrix_def.hpp#L607-L608 in a `parallel_for`. `View` destruction causes a global fence which causes the deadlock.
You should get static scheduling in host backends by default. Dynamic scheduling is only used when you specify the `Kokkos::Schedule` property for the respective policy. Do you need more than...
> this doesn't get caught unless the -DKokkos_ENABLE_COMPILER_WARNINGS=ON is set at configure time. What do compiler warnings have to do with the issue? I would rather have expected that you...
> Working on #7103 regarding the Trilinos include issues; compiling with Cuda enabled I'm seeing lots of these types of warnings: Also see https://github.com/kokkos/mdspan/pull/350.
`sycl::ext::oneapi::experimental::printf` is also unsupported and compiling the following tests crashes the compiler - Kokkos_CoreUnitTest_SYCL3 - Kokkos_CoreUnitTest_SYCL1B - Kokkos_CoreUnitTest_SYCL1A - Kokkos_ContainersUnitTest_SYCL - Kokkos_AlgorithmsUnitTest_StdSet_C - Kokkos_AlgorithmsUnitTest_StdSet_Team_B - Kokkos_AlgorithmsUnitTest_StdSet_Team_C
> No Jenkins build updates? No, updating our GPU testing strategy requires more thought since our resources are pretty limited. It's not clear how many more configurations we can afford.
> Does this increase the number of builds? How does the matrix thing work? We use 8 more builds. The matrix just sets up all combinations of certain parameters and...