sgx-lkl icon indicating copy to clipboard operation
sgx-lkl copied to clipboard

Everything that is bad about signal passing

Open KenGordon opened this issue 4 years ago • 6 comments

Catch all for fixes to hopefully solve a whole bunch of actual bugs.

KenGordon avatar Aug 03 '20 13:08 KenGordon

We run signals here: https://github.com/lsds/lkl/blob/86819ac5446e1fa31ca67db68aa165bbf07b90a9/arch/lkl/kernel/syscalls.c#L234

This does a loop here, popping signals off the pending list and delivering them:

https://github.com/lsds/lkl/blob/86819ac5446e1fa31ca67db68aa165bbf07b90a9/arch/lkl/kernel/signal.c#L68

There are a few things wrong here:

  • Signals are not delivered to the right thread, they are delivered to whichever thread returns from a syscall.
  • When a syscall returns early as a result of a signal being delivered, the signal delivery code does not set the correct return value.
  • When a signal is delivered, it does not wake up pending tasks.

I think the correct fix is to:

  1. Iterate through pending signals.
  2. Associate them with the task structs for the correct Linux threads.
  3. Wake up any of the tasks that are currently in the sleeping state.
  4. Run the scheduler to ensure that they actually wake.
  5. Handle only the signals intended for the current thread on syscall return.

We probably need to do steps 1-4 in the idle task loop so that we wake up threads with pending signals even if no threads issue syscalls:

https://github.com/lsds/lkl/blob/86819ac5446e1fa31ca67db68aa165bbf07b90a9/arch/lkl/kernel/syscalls.c#L277

davidchisnall avatar Aug 03 '20 15:08 davidchisnall

Our code for injecting signals into LKL looks quite different from that of other Linux architectures:

  • We handle signals by directly invoking a handler here: https://github.com/lsds/lkl/blob/eecb2a320f3f2dd85942526df0a3d3abd402912c/arch/lkl/kernel/signal.c#L38
  • In contrast, other Linux architectures create a signal frame and then call into the kernel: https://github.com/lsds/lkl/blob/86819ac5446e1fa31ca67db68aa165bbf07b90a9/arch/mips/kernel/signal.c#L846. This may mean that the signal is associated with the correct thread because a context switch can happen before the signal handler is invoked (?).
  • Note that for synchronous signals, we switch to a host task explicitly here: https://github.com/lsds/lkl/blob/86819ac5446e1fa31ca67db68aa165bbf07b90a9/arch/lkl/kernel/traps.c#L40

prp avatar Aug 03 '20 16:08 prp

Note that for synchronous signals, we switch to a host task explicitly

This is the correct thing to do for sync signals because they are delivered to an lthread, but the Linux scheduler may be pointing at the wrong thread. This won't work with async signals because they'd run in the correct host task but the wrong lthread.

davidchisnall avatar Aug 03 '20 17:08 davidchisnall

This is a catch all issue for things to do with signals.

See also

https://github.com/lsds/sgx-lkl/issues/680 Incomplete/unsafe signal handling with SGX1 https://github.com/lsds/sgx-lkl/issues/644 Signals not delivered to correct thread https://github.com/lsds/sgx-lkl/issues/700 [TEST]: Signal mask is unset for the signal during its handler execution (maybe a dup of 644) https://github.com/lsds/sgx-lkl/issues/518 [Tests] Issues with signal delivery using tgkill() to threads https://github.com/lsds/sgx-lkl/issues/209 Async signals aren't always delivered as expected

KenGordon avatar Aug 07 '20 13:08 KenGordon

Enable busybox ping network test when this bug is fixed. https://github.com/lsds/sgx-lkl/pull/809 cc @KenGordon @SeanTAllen

hukoyu avatar Aug 20 '20 17:08 hukoyu

See also https://github.com/lsds/sgx-lkl/issues/838 for the futex issue that prevents tgkill01 and rt_sigqueueinfo tests passing.

KenGordon avatar Sep 28 '20 15:09 KenGordon