RAJA icon indicating copy to clipboard operation
RAJA copied to clipboard

Order of operations test failure for reducer reset on host.

Open rchen20 opened this issue 3 years ago • 3 comments

Strange test failure when ordering of reset calls occurs in test-reducer-reset-seq when built with NVCC/10.1.243 + GCC/8.3.1 (https://github.com/LLNL/RAJA/pull/1207).

When the ordering of resets is maxloc after minloc, https://github.com/LLNL/RAJA/blob/8e8cc96cbccda2bfc33b14b57a8591b2cf9ca342/test/unit/reducer/tests/test-reducer-reset.hpp#L144-L145 the test fails when maxloc get is called, https://github.com/LLNL/RAJA/blob/8e8cc96cbccda2bfc33b14b57a8591b2cf9ca342/test/unit/reducer/tests/test-reducer-reset.hpp#L157

The test failure shows that the default initialized values were returned by get:

[chen59@rzansel9:reducer]$ ../../test-reducer-reset-seq.exe Running main() from /usr/WS1/chen59/allraja/rajaatomicexhaustive/raja_git_atomicexhaustive/blt/thirdparty_builtin/googletest-master-2020-01-07/googletest/src/gtest_main.cc [==========] Running 3 tests from 3 test suites. [----------] Global test environment set-up. [----------] 1 test from SequentialResetTest/ReducerResetUnitTest/0, where TypeParam = camp::list<RAJA::policy::sequential::seq_reduce, int, camp::resources::v1::Host, forone_seq> [ RUN ] SequentialResetTest/ReducerResetUnitTest/0.BasicReset /usr/WS1/chen59/allraja/rajaatomicexhaustive/raja_git_atomicexhaustive/test/unit/reducer/tests/test-reducer-reset.hpp:157: Failure Expected equality of these values: (NumericType)reduce_maxloctup.get() Which is: -2147483648 (NumericType)(resetVal[0]) [13/1814] Which is: 10 [ FAILED ] SequentialResetTest/ReducerResetUnitTest/0.BasicReset, where TypeParam = camp::list<RAJA::policy::sequential::seq_reduce, int, camp::resources::v1::Host, forone_seq> (262 ms) [----------] 1 test from SequentialResetTest/ReducerResetUnitTest/0 (262 ms total)

[----------] 1 test from SequentialResetTest/ReducerResetUnitTest/1, where TypeParam = camp::list<RAJA::policy::sequential::seq_reduce, float, camp::resources::v1::Host, forone_seq> [ RUN ] SequentialResetTest/ReducerResetUnitTest/1.BasicReset /usr/WS1/chen59/allraja/rajaatomicexhaustive/raja_git_atomicexhaustive/test/unit/reducer/tests/test-reducer-reset.hpp:157: Failure Expected equality of these values: (NumericType)reduce_maxloctup.get() Which is: -3.40282e+38 (NumericType)(resetVal[0]) Which is: 10 [ FAILED ] SequentialResetTest/ReducerResetUnitTest/1.BasicReset, where TypeParam = camp::list<RAJA::policy::sequential::seq_reduce, float, camp::resources::v1::Host, forone_seq> (0 ms)

rchen20 avatar Feb 10 '22 20:02 rchen20

@trws @rhornung67 After adding -fsanitize=undefined to the build and link lines, the test passes regardless of the ordering of reset calls. I'm rather baffled . . . did I use ubsan properly?

rchen20 avatar Feb 10 '22 20:02 rchen20

I can think oif nothing else to say except, that's awesome! I think we have some memory leak checking turned on for some Gitlab CI jobs.

rhornung67 avatar Feb 10 '22 21:02 rhornung67

It sounds like you did, but that makes me think it's time to turn on asan with leakcheck and go again. If it's not UB, that's almost worse, it makes me wonder if somehow we have an invalid memory access in there somehow.

trws avatar Feb 10 '22 23:02 trws