tests: enable random shuffling on a subset of the CI
Tackling #7435.
13: [ RUN ] defaultdevicetype.reduce_instantiation_c2
13/57 Test #13: Kokkos_CoreUnitTest_Default ................................***Exception: SegFault 0.97 sec
in CUDA RDC build
13: [ RUN ] defaultdevicetype.reduce_instantiation_c2 13/57 Test #13: Kokkos_CoreUnitTest_Default ................................***Exception: SegFault 0.97 secin CUDA RDC build
I could reproduce, using the seed 10003. It fails all the time (I repeated the test 5 times on my laptop).
Filtering the test cases with --gtest_filter=*reduce_instantiation_c2* (ensuring I'm running that failing test only) also triggers the seg fault.
Here this might help to at least build again: -Xnvlink --suppress-stack-size-warning
See: https://github.com/kokkos/kokkos/issues/2039
I am all in favor of this once we resolve that failure
Just managed to get a trace with `cuda-gdb`
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from defaultdevicetype
[ RUN ] defaultdevicetype.reduce_instantiation_c2
Thread 1 "Kokkos_CoreUnit" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(cuda-gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000555555643d96 in __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)>::__nv_hdl_wrapper_t(__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)> const&) (in=..., this=<optimized out>) at nvcc_internal_extended_lambda_implementation:236
#2 Kokkos::Impl::FunctorAnalysis<Kokkos::Impl::FunctorPatternInterface::REDUCE, Kokkos::RangePolicy<Kokkos::Cuda>, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)>, double>::Reducer::Reducer(__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)> const&) (arg_functor=..., this=<optimized out>)
at /workspaces/kokkos/build-with-cuda-11.0-nvcc-rdc-install/install/include/impl/Kokkos_FunctorAnalysis.hpp:995
#3 Kokkos::Impl::ParallelReduceAdaptor<Kokkos::RangePolicy<Kokkos::Cuda>, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)>, double>::execute_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Cuda> const&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)> const&, double&) (return_value=@0x7fffffffd388: 99, functor=..., policy=..., label=...)
at /workspaces/kokkos/build-with-cuda-11.0-nvcc-rdc-install/install/include/Kokkos_Parallel_Reduce.hpp:1525
#4 Kokkos::Impl::ParallelReduceAdaptor<Kokkos::RangePolicy<Kokkos::Cuda>, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)>, double>::execute<double>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Cuda> const&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)> const&, double&) (return_value=@0x7fffffffd388: 99, functor=..., policy=..., label=...)
at /workspaces/kokkos/build-with-cuda-11.0-nvcc-rdc-install/install/include/Kokkos_Parallel_Reduce.hpp:1555
#5 Kokkos::parallel_reduce<Kokkos::RangePolicy<Kokkos::Cuda>, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)>, double>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Cuda> const&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)> const&--Type <RET> for more, q to quit, c to continue without paging--
, double&) (return_value=@0x7fffffffd388: 99, functor=..., policy=..., label=...)
at /workspaces/kokkos/build-with-cuda-11.0-nvcc-rdc-install/install/include/Kokkos_Parallel_Reduce.hpp:1687
#6 Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddReturnArgument<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)> >(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>), &(void Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> >(int, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda>)), 1u>, void (int const&, double&)>) (N=1000)
at /workspaces/kokkos/core/unit_test/TestReduceCombinatorical.hpp:353
#7 0x00005555556d8732 in Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> > (N=1000) at /usr/include/c++/9/bits/basic_string.h:940
#8 Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddFunctorLambdaRange<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::RangePolicy<Kokkos::Cuda> > (N=1000) at /workspaces/kokkos/core/unit_test/TestReduceCombinatorical.hpp:495
#9 Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::AddPolicy_2<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (
N=N@entry=1000) at /workspaces/kokkos/core/unit_test/TestReduceCombinatorical.hpp:523
#10 0x00005555556b4afc in Test::TestReduceCombinatoricalInstantiation<Kokkos::Cuda>::execute_c2 () at /usr/include/c++/9/bits/char_traits.h:300
#11 Test::defaultdevicetype_reduce_instantiation_c2_Test::TestBody (this=<optimized out>)
at /workspaces/kokkos/core/unit_test/default/TestDefaultDeviceType_c2.cpp:29
#12 0x0000555555880051 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x5555558f2905 "the test body",
method=<optimized out>, object=0x555559f42f40) at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:4082
#13 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x555559f42f40, method=<optimized out>,
location=location@entry=0x5555558f2905 "the test body") at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:4137
#14 0x0000555555871440 in testing::Test::Run (this=this@entry=0x555559f42f40) at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:4176
#15 0x00005555558718d5 in testing::Test::Run (this=0x555559f42f40) at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:4326
#16 testing::TestInfo::Run (this=0x5555577b5180) at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:4326
#17 0x0000555555872031 in testing::TestInfo::Run (this=<optimized out>) at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:4299
#18 testing::TestSuite::Run (this=0x5555577b1910) at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:4480
#19 0x00005555558737b9 in testing::TestSuite::Run (this=<optimized out>) at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:4459
#20 testing::internal::UnitTestImpl::RunAllTests (this=0x5555577b1540) at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:7320
#21 0x0000555555873ce8 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
location=0x5555558f42f0 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x5555577b1540)
at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:4082
#22 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
location=0x5555558f42f0 "auxiliary test code (environments or event listeners)",
method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x5555558726d0 <testing::internal::UnitTestImpl::RunAllTests()>,
object=0x5555577b1540) at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:4137
#23 testing::UnitTest::Run (this=<optimized out>) at /workspaces/kokkos/tpls/gtest/gtest/gtest-all.cc:6903
#24 0x000055555557b217 in RUN_ALL_TESTS () at /workspaces/kokkos/tpls/gtest/gtest/gtest.h:12371
#25 main (argc=<optimized out>, argv=0x7fffffffda98) at /workspaces/kokkos/core/unit_test/UnitTestMainInit.cpp:26
3 - Kokkos_CoreUnitTest_HIP (Timeout)
looks suspicious.
Retest this please.
Looks like it timed out again ...
Reproduced the CUDA fail. It's only there in release mode, not debug. Testing with different cuda versions to see if it's specific to 11.0
Also, seems the last commit fixed the issue for cuda-11.0-RDC build, but not the cuda-11.0.3-clang-tidy one.
Also, seems the last commit fixed the issue for cuda-11.0-RDC build, but not the cuda-11.0.3-clang-tidy one.
That's just the shared memory test that fails occasionally depending on other stuff running simultaneously.
I can't reproduce the HIP test failure locally. Let's try rerunning the CI one more time.
Retest this please.
That's just the shared memory test that fails occasionally depending on other stuff running simultaneously.
I think the Default test was also failing for that configuration? We will see with this current round of CI.
A little more testing,
- It is an ordering issue among
--gtest_filter=*reduce_instantiation*tests.- You can find orders of
reduce_instantiationtests that pass and fail based on when_c2is called - Still need to find exactly what tests need to be run first
- You can find orders of
- I triggered using cuda 12.4, so not a cuda 11.0 specific issue.
- Release only
Retest this please.
We still need to fix this or?
Retest this please
Retest this please
One CUDA build on the ORNL ones failed in OpenMP subview. No error message just failed to build that object file. I think this should be unrelated thoughts?
I think it's fine.
Retest this please