sgx-lkl
sgx-lkl copied to clipboard
[Test] Fix and enable 10 tests disabled with PR 789
Below tests disabled with PR: https://github.com/lsds/sgx-lkl/pull/789/files
- [ ] gettimeofday02
- [ ] mmap11
- [ ] futex_cmp_requeue01
- [ ] getcwd04
- [ ] send01
- [ ] setresuid04
- [ ] setreuid07
- [ ] symlink01
- [ ] fstat03 (Disabled in https://github.com/lsds/sgx-lkl/pull/812)
- [ ] chroot03 (Disabled in https://github.com/lsds/sgx-lkl/pull/812)
Fix the failure reason and enable back. cc @KenGordon @SeanTAllen @davidchisnall @vtikoo @paulcallen
gettimeofday02 passing previously would appear to be happenstance related to async signal bugs. Any test that depends on the delivery of an async signal is likely to fail. gettimeofday02 should either be heavily patched or left disabled with a note to enable once async signal handling is fixed (which is a p0 issue).
The test in its current states hangs as the alarm to stop the test "isn't being delivered" which is a known open issue.
https://github.com/lsds/sgx-lkl/issues/209
Is there some place we want to record that when #209 is closed, that we should enable gettimeofday02?
mmap11 "failure" is unrelated to the functionality under test. It appears to be a deterministic shutdown hang. For me, it happens with both hw and sw modes.
Given that https://github.com/lsds/sgx-lkl/pull/788 exists and will change the shutdown sequence, I propose waiting to address the mmap11 failure until 788 is merged.
from @vtikoo:
mmap11 creates a detached pthread - https://github.com/lsds/ltp/blob/sgx-lkl/testcases/kernel/syscalls/mmap/mmap11.c#L101. Theres an open p0 for fixing detached thread support #779.
futex_cmp_requeue01 hangs because
while (thread_cnt < tc->num_waiters) {
sched_yield();
}
never exits.
here's the full-test:
https://github.com/lsds/ltp/blob/sgx-lkl/testcases/kernel/syscalls/futex/futex_cmp_requeue01.c
a PR has been opened to address the write05 test:
https://github.com/lsds/ltp/pull/73
Regarding futex_cmp_requeue01, sched_yield
now goes via LKL instead of directly calling lthread_yield
- https://github.com/lsds/sgx-lkl-musl/pull/18/files#diff-687e538b71be7b81c2d4ddf641470487.
This could be a regression. Is this a determinsitic failure?
Regarding futex_cmp_requeue01,
sched_yield
now goes via LKL instead of directly callinglthread_yield
- https://github.com/lsds/sgx-lkl-musl/pull/18/files#diff-687e538b71be7b81c2d4ddf641470487.This could be a regression. Is this a determinsitic failure?
@vtikoo it is for me.
Regarding futex_cmp_requeue01,
sched_yield
now goes via LKL instead of directly callinglthread_yield
- https://github.com/lsds/sgx-lkl-musl/pull/18/files#diff-687e538b71be7b81c2d4ddf641470487.This could be a regression. Is this a determinsitic failure?
I indeed believe that this is causing problems. I think it's one of the reasons why I see shutdown issues with DotNet here: https://github.com/lsds/sgx-lkl/pull/788#issue-467907462
getcwd04 exits because it checks to make sure there is not 1 cpu.
if (tst_ncpus() == 1)
tst_brk(TCONF, "This test needs two cpus at least");
If that was fixed by removing the test for CPUs (I don't think it is needed given how we patched to to use threads), it then fails because it is relying on the delivery of an asynchronous signal. It should be able to be re-enabled once #209 is fixed.
setresuid04 and setreuid07 are working and can be re-enabled.
@prp there seem to be multiple ways which can cause cloned host tasks hangups. Could you clarify whether you think the DotNet failures are specifically related to sched_yield
or cloned host task hangups in general?
In most cases, I see a deadlock in which the termination thread fails to obtain a CPU lock for syscalls but nothing else is running. The DotNet hang is different: one of the DotNet userspace threads keeps invoking sched_yield()
and making futex calls, while the termination thread is waiting for the CPU lock.
send01 is failing because it hangs. there's a call in a thread to select
that never returns. I'm not sure why it was passing previously.
a couple things that won't work to fix right now:
- using pthread_cancel to cancel the thread. currently it segfaults. pthread_cancel is using signals so its problematic.
- setting a timeout on the
select
call. it doesn't always timeout for reasons that I havent' looked into yet.
write05 patched with https://github.com/lsds/ltp/pull/73 and then with lsds/ltp#74
Tests fstat03 and and chroot03 are also failing due to SIGSEGV (page fault) signal issue. These two were not caught before since there were build errors and test binaries were not generated (Created issue https://github.com/lsds/sgx-lkl/issues/810 to track that). Fixed the build issue with https://github.com/lsds/ltp/pull/74 and disabled these two tests in PR https://github.com/lsds/sgx-lkl/pull/812