phoenix-rtos-project icon indicating copy to clipboard operation
phoenix-rtos-project copied to clipboard

kernel: `waitpid` doesn't consume relevant `SIGCHLD` signal

Open nalajcie opened this issue 2 years ago • 0 comments

This is a POSIX incompatibility which results in strange errors (libc functions returning EINTR errno unexpectedly) in processes managing their children (mostly spotted in busybox ash and open libc function).

The immediate cause of the issue is delayed delivery of SIGCHLD signal to the process even as the child process was already reaped by waitpid() call (the parent process sleeps on waitpid()).

The POSIX requirements for wait() and waitpid() (https://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html) state that they need to discard a pending SIGCHLD signal that is associated with a successfully waited-for child process.

Steps to reproduce (tested on imx6ull):

  • run small program to stimulate frequent context switches:
    int main() {
      while (1)
        usleep(10);
      return 0;
    }
    
  • in busybox ash run:
    while :; do sleep 1; echo "test" > /dev/null; done
    
  • result:
    /bin/sh: can't create /dev/null: EINTR
    open(/dev/null): resolve_path failed with errno: EINTR   # this is a debug message showing the exact origin of error in libphoenix
    

OpenGroup also states that the above semantics is also needed to implement system libc function correctly - no SIGCHLD signal handler in user code should be called as a result of system call usage (this is also a good testcase for this issue).

To fix this issue some redesign of signal generation / queuing / waitpid might be needed. Please note that the implementation needs to:

  • discard pending SIGCHLD connected to the successfully waited-for child process
  • be able to use waitpid inside SIGCHLD signal handler (see OpenGroup description for sample code)

nalajcie avatar Sep 20 '21 10:09 nalajcie