compute-runtime
compute-runtime copied to clipboard
neo_page_fault_manager_tests randomly fails
still trying to package this up for alpine linux, as reported also in #277 (after fixing it) i have neo_page_fault_manager_tests
randomly failing, sometimes it succeeds, other times it just fails, see this sample execution:
while ./neo_page_fault_manager_tests ; do : ; done
set timeout to: 45
Iteration: 1
Iteration: 1. random_seed: 0
=====================
== ULTs PASSED ==
=====================
Tests run: 16
Tests passed: 16
Tests skipped: 0
Tests failed: 0
Tests disabled: 0
Time elapsed: 0 ms
=====================
set timeout to: 45
Iteration: 1
Iteration: 1. random_seed: 0
=====================
== ULTs PASSED ==
=====================
Tests run: 16
Tests passed: 16
Tests skipped: 0
Tests failed: 0
Tests disabled: 0
Time elapsed: 1 ms
=====================
set timeout to: 45
Iteration: 1
Iteration: 1. random_seed: 0
./compute-runtime-20.08.15750/core/unit_tests/page_fault_manager/linux/cpu_page_fault_manager_linux_tests.cpp:23: Failure
Value of: pageFaultManager->handlerInvoked
Actual: false
Expected: true
[ FAILED ][ SKL ][ 0 ] PageFaultManagerLinuxTest.whenPageFaultIsRaisedThenHandlerIsInvoked
i tried to reproduce this in gdb to figure out what happens, however these tests never fail in gdb, making this a challenging heisenbug.
do you dear people have any idea why this does not consistently fails or succeeds but exhibits this random behaviour? and if not what could i try to debug this?
This is a really simple test which registers custom signal handler using sigaction, then raises SIGSEGV and checks if our handler is being called.
yeah, that's what i figured too, i thought maybe you do some random magic deeper in your mock/unit testing framework.
@stef does this tests starts to fail after some of your local changes OR it fails randomly on driver without any modifications ?
without my patches i cannot compile on alpine linux, so yeah i have these patches applied: https://github.com/aports-ugly/aports/tree/master/ugly/compute-runtime
i forgot to also note the modification that is mentioned in #277 supporting mmap without hugetlb, but i suspect that might be unrelated here.
The test that fails is actually quite simple
TEST_F(PageFaultManagerLinuxTest, whenPageFaultIsRaisedThenHandlerIsInvoked) { auto pageFaultManager = std::make_unique<MockPageFaultManagerLinux>(); EXPECT_FALSE(pageFaultManager->handlerInvoked); std::raise(SIGSEGV); EXPECT_TRUE(pageFaultManager->handlerInvoked); }
If it fails it means that handler was not invoked. Can you try to debug this with some other techniques to see why it happens? (like using printfs)
We should call in here on std::raise bool verifyPageFault(void *ptr) override { handlerInvoked = true; if (allowCPUMemoryAccessOnPageFault) { this->allowCPUMemoryAccess(ptr, size); } return returnStatus; }
yeah, i also tried to DISABLE_
prefix that testname and see if the other tests succeed consistently, but no. i'll try some printf tracing.
so i put trace output in the constructor, destructor and PageFaultManagerLinux::pageFaultHandlerWrapper()
. when everything goes right it gives this output:
set timeout to: 45
Iteration: 1
Iteration: 1. random_seed:
installed sighandler: 0
called sighandler
uninstalled sighandler
installed sighandler: 0
called sighandler
uninstalled sighandler
installed sighandler: 0
called sighandler
uninstalled sighandler
installed sighandler: 0
called sighandler
uninstalled sighandler
installed sighandler: 0
uninstalled sighandler
installed sighandler: 0
uninstalled sighandler
=====================
== ULTs PASSED ==
=====================
Tests run: 16
Tests passed: 16
Tests skipped: 0
Tests failed: 0
Tests disabled: 0
Time elapsed: 0 ms
=====================
and when it fails it gives this output:
set timeout to: 45
Iteration: 1
Iteration: 1. random_seed: 0
installed sighandler: 0
/home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/core/unit_tests/page_fault_manager/linux/cpu_page_fault_manager_linux_tests.cpp:23: Failure
Value of: pageFaultManager->handlerInvoked
Actual: false
Expected: true
uninstalled sighandler
[ FAILED ][ SKL ][ 0 ] PageFaultManagerLinuxTest.whenPageFaultIsRaisedThenHandlerIsInvoked
installed sighandler: 0
Is this a sporadic problem ? Can you check if handler is properly intialized here:
PageFaultManagerLinux::PageFaultManagerLinux() { pageFaultHandler = [&](int signal, siginfo_t *info, void *context) { if (!this->verifyPageFault(info->si_addr)) { callPreviousHandler(signal, info, context); } };
struct sigaction pageFaultManagerHandler = {};
pageFaultManagerHandler.sa_flags = SA_SIGINFO;
pageFaultManagerHandler.sa_sigaction = pageFaultHandlerWrapper;
auto retVal = sigaction(SIGSEGV, &pageFaultManagerHandler, &previousHandler);
UNRECOVERABLE_IF(retVal != 0);
}
Can you put printf in lambda to see if it is properly called when exception is thrown ?
Is this a sporadic problem ?
yeah it's non-deterministic. how non-deterministic? i just ran this:
fail=0; for a in $(seq 1000); do ./neo_page_fault_manager_tests || fail=$((fail+1)) ; done ; echo "fail" $fail
and it gave me this result: 483/1000
currently i have these fprintfs in the code:
PageFaultManagerLinux::PageFaultManagerLinux() {
pageFaultHandler = [&](int signal, siginfo_t *info, void *context) {
fprintf(stderr,"in lambda before prev handler\n");
if (!this->verifyPageFault(info->si_addr)) {
callPreviousHandler(signal, info, context);
}
fprintf(stderr,"in lambda after prev handler\n");
};
struct sigaction pageFaultManagerHandler = {};
pageFaultManagerHandler.sa_flags = SA_SIGINFO;
pageFaultManagerHandler.sa_sigaction = pageFaultHandlerWrapper;
auto retVal = sigaction(SIGSEGV, &pageFaultManagerHandler, &previousHandler);
fprintf(stderr, "installed sighandler: %d\n", retVal);
UNRECOVERABLE_IF(retVal != 0);
}
PageFaultManagerLinux::~PageFaultManagerLinux() {
if (!previousHandlerRestored) {
auto retVal = sigaction(SIGSEGV, &previousHandler, nullptr);
UNRECOVERABLE_IF(retVal != 0);
}
fprintf(stderr, "uninstalled sighandler\n");
}
void PageFaultManagerLinux::pageFaultHandlerWrapper(int signal, siginfo_t *info, void *context) {
fprintf(stderr, "called sighandler\n");
pageFaultHandler(signal, info, context);
}
when it fails this is the output:
set timeout to: 45
Iteration: 1
Iteration: 1. random_seed: 0
installed sighandler: 0
/home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/core/unit_tests/page_fault_manager/linux/cpu_page_fault_manager_linux_tests.cpp:23: Failure
Value of: pageFaultManager->handlerInvoked
Actual: false
Expected: true
uninstalled sighandler
[ FAILED ][ SKL ][ 0 ] PageFaultManagerLinuxTest.whenPageFaultIsRaisedThenHandlerIsInvoked
installed sighandler: 0
zsh: segmentation fault (core dumped) ./neo_page_fault_manager_tests
since it says: installed sighandler: 0
it means that sigaction
returns without error.
for reference: this is the test code that fails:
TEST(PageFaultManagerLinuxTest, whenPageFaultIsRaisedThenHandlerIsInvoked) {
auto pageFaultManager = std::make_unique<MockPageFaultManagerLinux>();
EXPECT_FALSE(pageFaultManager->handlerInvoked);
std::raise(SIGSEGV);
EXPECT_TRUE(pageFaultManager->handlerInvoked);
}
source: https://github.com/intel/compute-runtime/blob/03ce6681a052a6009ee10649617395a1e83138e5/shared/test/unit_test/page_fault_manager/linux/cpu_page_fault_manager_linux_tests.cpp#L26
it fails in the last line where it expects the handlerInvoked
.
but if i disable this, then some later unit test fails in this same file.
looking at strace log of a failing process:
32044 rt_sigaction(SIGSEGV, {sa_handler=0x558f235d5b20, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7fb87357328f}, {sa_handler=0x558f23252bc0, sa_mask=~[KILL STOP RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fb87357328f}, 8) = 0
32044 writev(2, [{iov_base="installed sighandler: 0x7ffdc143"..., iov_len=39}, {iov_base=NULL, iov_len=0}], 2) = 39
32044 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [TRAP ABRT BUS USR1 SEGV PIPE ALRM TERM STKFLT CONT URG XFSZ SYS RTMIN RT_1 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8 RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15], 8) = 0
32044 tkill(32044, SIGSEGV) = 0
32044 rt_sigprocmask(SIG_SETMASK, [TRAP ABRT BUS USR1 SEGV PIPE ALRM TERM STKFLT CONT URG XFSZ SYS RTMIN RT_1 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8 RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15], NULL, 8) = 0
32044 writev(1, [{iov_base="", iov_len=0}, {iov_base="\n", iov_len=1}], 2) = 1
32044 writev(1, [{iov_base="", iov_len=0}, {iov_base=..."comput"..., iov_len=225}], 2) = 225
32044 writev(1, [{iov_base="Expected: true", iov_len=14}, {iov_base="\n", iov_len=1}], 2) = 15
32044 rt_sigaction(SIGSEGV, {sa_handler=0x558f23252bc0, sa_mask=~[KILL STOP RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fb87357328f}, NULL, 8) = 0
32044 writev(2, [{iov_base="uninstalled sighandler\n", iov_len=23}, {iov_base=NULL, iov_len=0}], 2) = 23
for comparison here is the strace of a successful test:
365 rt_sigaction(SIGSEGV, {sa_handler=0x55ad0b126b20, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f6a6cc5228f}, {sa_handler=0x55ad0ada3bc0, sa_mask=~[KILL STOP RTMIN RT_1 RT_2], sa_flags= SA_RESTORER|SA_RESTART, sa_restorer=0x7f6a6cc5228f}, 8) = 0
365 writev(2, [{iov_base="installed sighandler: 0x7fff7d1370e0 0\n", iov_len=39}, {iov_base=NULL, iov_len=0}], 2) = 39
365 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [BUS FPE USR1 ALRM TERM CHLD CONT TTIN XFSZ PROF WINCH IO PWR SYS RT_1 RT_2 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8 RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15], 8) = 0
365 tkill(365, SIGSEGV) = 0
365 rt_sigprocmask(SIG_SETMASK, [BUS FPE USR1 ALRM TERM CHLD CONT TTIN XFSZ PROF WINCH IO PWR SYS RT_1 RT_2 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8 RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15], NULL, 8) = 0
365 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_TKILL, si_pid=365, si_uid=1000} ---
365 writev(2, [{iov_base="called sighandler\n", iov_len=18}, {iov_base=NULL, iov_len=0}], 2) = 18
365 writev(2, [{iov_base="in lambda before prev handler\n", iov_len=30}, {iov_base=NULL, iov_len=0}], 2) = 30
365 writev(2, [{iov_base="in lambda after prev handler\n", iov_len=29}, {iov_base=NULL, iov_len=0}], 2) = 29
365 rt_sigreturn({mask=[BUS FPE USR1 ALRM TERM CHLD CONT TTIN XFSZ PROF WINCH IO PWR SYS RT_1 RT_2 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8 RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15]}) = 0
365 rt_sigaction(SIGSEGV, {sa_handler=0x55ad0ada3bc0, sa_mask=~[KILL STOP RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f6a6cc5228f}, NULL, 8) = 0
365 writev(2, [{iov_base="uninstalled sighandler\n", iov_len=23}, {iov_base=NULL, iov_len=0}], 2) = 23
from comparing the two straces it seems that there is no signal delivered to the process after tkill
the parameters to sigprocmask seem to be different around tkill.
so the important detail from the strace logs is immediately before the quoted part, it comes after
Iteration: 1. random_seed: 0
is printed on the console, for the successful test it is this:
rt_sigprocmask(SIG_SETMASK, [BUS FPE KILL USR1 ALRM TERM CHLD CONT TTIN XFSZ PROF WINCH IO PWR SYS ...
for the failing test it is:
rt_sigprocmask(SIG_SETMASK, [TRAP ABRT BUS FPE KILL USR1 SEGV USR2 PIPE STKFLT CHLD XFSZ PROF WINCH
debugging UnitTestImpl::RunAllTests()
now.
i've been trying to find out where this rt_sigprocmask(SIG_SETMASK, [
is called from, so i plastered https://github.com/intel/compute-runtime/blob/03ce6681a052a6009ee10649617395a1e83138e5/third_party/gtest/gmock-gtest-all.cc#L3965 with trace output. it seems to come from
internal::HandleExceptionsInMethodIfSupported(
this, &Test::TestBody, "the test body");
what is interesting, this Test::Run()
is being called 10 times before the testcase PageFaultManagerLinuxTest.whenPageFaultIsRaisedThenHandlerIsInvoked
this is how i modified Test::Run
:
void Test::Run() {
if (!HasSameFixtureClass()) return;
fprintf(stderr,"1\n");
internal::UnitTestImpl* const impl = internal::GetUnitTestImpl();
fprintf(stderr,"2\n");
impl->os_stack_trace_getter()->UponLeavingGTest();
fprintf(stderr,"3\n");
internal::HandleExceptionsInMethodIfSupported(this, &Test::SetUp, "SetUp()");
fprintf(stderr,"4\n");
// We will run the test only if SetUp() was successful and didn't call
// GTEST_SKIP().
if (!HasFatalFailure() && !IsSkipped()) {
impl->os_stack_trace_getter()->UponLeavingGTest();
fprintf(stderr,"5\n");
internal::HandleExceptionsInMethodIfSupported(
this, &Test::TestBody, "the test body");
fprintf(stderr,"6\n");
}
// However, we want to clean up as much as possible. Hence we will
// always call TearDown(), even if SetUp() or the test body has
// failed.
fprintf(stderr,"7\n");
impl->os_stack_trace_getter()->UponLeavingGTest();
fprintf(stderr,"8\n");
internal::HandleExceptionsInMethodIfSupported(
this, &Test::TearDown, "TearDown()");
fprintf(stderr,"9\n");
}
and this is the strace (annotated) output:
...
unlink("igdrcl.log") = -1 ENOENT (No such file or directory)
brk(0x562ec7edb000) = 0x562ec7edb000
writev(1, [{iov_base="Iteration: 1", iov_len=12}, {iov_base="\n", iov_len=1}], 2) = 13
writev(1, [{iov_base="Iteration: 1. random_seed: 0", iov_len=28}, {iov_base="\n", iov_len=1}], 2) = 29
here is the first interesting output from the testcase: Iteration: 1. random_seed: 0
brk(0x562ec7edc000) = 0x562ec7edc000
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
here is the first invocation of Test::Run()
brk(0x562ec7edc000) = 0x562ec7edc000
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
brk(0x562ec7edd000) = 0x562ec7edd000
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
this is the 5th trace output, so now comes the invocation of internal::HandleExceptionsInMethodIfSupported
, with a lot of rt_sigprocmask(SIG_SETMASK,
calls, the last one being the one of interest:
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd0c2c0000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd0c056000
mincore(0x7ffd4fe4ab1f, 1, 0x7ffd4fe4ab1f) = -1 EINVAL (Invalid argument)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
msync(0x7ffd4fe4b000, 4096, MS_ASYNC) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [TRAP ABRT BUS FPE KILL SEGV USR2 ALRM STKFLT STOP TTOU URG XCPU XFSZ VTALRM PROF WINCH SYS RT_1 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8 RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15], NULL, 8) = 0
after this Test::Run()
finishes, and is called 9 more times according to the trace output of Test::Run()
:
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
brk(0x562ec7ede000) = 0x562ec7ede000
brk(0x562ec7edf000) = 0x562ec7edf000
brk(0x562ec7ef1000) = 0x562ec7ef1000
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
finally we get into the Test::Run()
that runs PageFaultManagerLinuxTest.whenPageFaultIsRaisedThenHandlerIsInvoked
:
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
rt_sigaction(SIGSEGV, {sa_handler=0x562eb190db20, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7fdd0c27528f}, {sa_handler=0x562eb158abc0, sa_mask=~[KILL STOP RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fdd0c27528f}, 8) = 0
writev(2, [{iov_base="installed sighandler: 0x7ffd4fe4bb10 0\n", iov_len=39}, {iov_base=NULL, iov_len=0}], 2) = 39
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [TRAP ABRT BUS FPE SEGV USR2 ALRM STKFLT TTOU URG XCPU XFSZ VTALRM PROF WINCH SYS RT_1 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8 RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15], 8) = 0
tkill(7979, SIGSEGV) = 0
rt_sigprocmask(SIG_SETMASK, [TRAP ABRT BUS FPE SEGV USR2 ALRM STKFLT TTOU URG XCPU XFSZ VTALRM PROF WINCH SYS RT_1 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8 RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15], NULL, 8) = 0
writev(1, [{iov_base="", iov_len=0}, {iov_base="\n", iov_len=1}], 2) = 1
writev(1, [{iov_base="", iov_len=0}, {iov_base="compute-runtime-20.08.15750/core/unit_tests/page_fault_manager/linux/cpu_page_faul"..., iov_len=225}], 2) = 225
writev(1, [{iov_base="Expected: true", iov_len=14}, {iov_base="\n", iov_len=1}], 2) = 15
rt_sigaction(SIGSEGV, {sa_handler=0x562eb158abc0, sa_mask=~[KILL STOP RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fdd0c27528f}, NULL, 8) = 0
writev(2, [{iov_base="uninstalled sighandler\n", iov_len=23}, {iov_base=NULL, iov_len=0}], 2) = 23
writev(2, [{iov_base="6\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="7\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="8\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="9\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(1, [{iov_base="[ FAILED ][ SKL ][ 0 ] PageFaultManagerLinuxTest.whenPageFaultIsRaisedThenHandlerIsInvoked", iov_len=92}, {iov_base="\n", iov_len=1}], 2) = 93
writev(2, [{iov_base="1\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="2\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="3\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="4\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
writev(2, [{iov_base="5\n", iov_len=2}, {iov_base=NULL, iov_len=0}], 2) = 2
rt_sigaction(SIGSEGV, {sa_handler=0x562eb190db20, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7fdd0c27528f}, {sa_handler=0x562eb158abc0, sa_mask=~[KILL STOP RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fdd0c27528f}, 8) = 0
writev(2, [{iov_base="installed sighandler: 0x7ffd4fe4ba20 0\n", iov_len=39}, {iov_base=NULL, iov_len=0}], 2) = 39
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd0c046000
mprotect(0x7fdd0c046000, 65536, PROT_NONE) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_TKILL, si_pid=7979, si_uid=1000} ---
+++ killed by SIGSEGV (core dumped) +++
and as seen there is no segfault delivered after the tkill invocation.
My understanding was, that the first test case is PageFaultManagerLinuxTest.whenPageFaultIsRaisedThenHandlerIsInvoked
, now i wonder what the previous 10 calls are, and especially what the first case is that sets the sigprocmask? Also i wonder why the first call to internal::HandleExceptionsInMethodIfSupported
does so much more than in the latter cases where it does nothing, and also why this function is non-deterministic?
after some more investigating, the sigprocmask is set in https://github.com/intel/compute-runtime/blob/a1c6b0f4a7442513424dddc9b296ebe4d53b9ef7/shared/test/unit_test/page_fault_manager/cpu_page_fault_manager_tests.cpp#L58
this also explains where those 10 extra test cases are coming from, which i wondered about in the previous comment.
it seems to boil down to this macro and its try/catch: https://github.com/intel/compute-runtime/blob/0e202aaf67e7bb765f1faeafff8bf372781ae707/third_party/gtest/gtest/gtest.h#L7750
This macro is not used in this test so that is intresting. Does it ever fail if you only run this one test ? (gtest filter may be used to selectively only run one test)
the problem is that this macro/test does this:
rt_sigprocmask(SIG_SETMASK, [TRAP ABRT BUS FPE KILL SEGV USR2 ALRM STKFLT STOP TTOU URG XCPU XFSZ VTALRM PROF WINCH SYS RT_1 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8 RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15], NULL, 8) = 0
the problem is it sets also SEGV
and this sigprocmask is never reset later, and then in a later tests that does this:
TEST(PageFaultManagerLinuxTest, whenPageFaultIsRaisedThenHandlerIsInvoked) {
auto pageFaultManager = std::make_unique<MockPageFaultManagerLinux>();
EXPECT_FALSE(pageFaultManager->handlerInvoked);
std::raise(SIGSEGV);
EXPECT_TRUE(pageFaultManager->handlerInvoked);
}
the std::raise
does not trigger the pagefault handler and thus the EXPECT_TRUE
fails.
but strangely this rt_sigprocmask
is called half of the time with these params, and half of the time with other params, that do not set SEGV
. Using my gdb i've descendet into the bowels of libstdc++ and am no looking at what happens between the throw and the catch, i have a suspicion that libunwind might be somehow involved.
There are no plans for proactive enabling and distro adoption except for Ubuntu. We are open to accepting contributions (pull requests) that unblock support.