yash-rs icon indicating copy to clipboard operation
yash-rs copied to clipboard

Implication of suspending a process that is not a direct child of the job-controlling shell

Open magicant opened this issue 2 years ago • 0 comments

The C version of yash generates a process tree like below when executing yes | less with job control enabled:

   PPID     PID    PGID     SID TTY        TPGID STAT   UID   TIME COMMAND
 515770  516152  516152  516152 pts/0     516152 Ss+   1000   0:00 /bin/yash
 516152  592476  592476  516152 pts/0     516152 T     1000   0:00  \_ yes
 516152  592477  592476  516152 pts/0     516152 T     1000   0:00  \_ less

This tree is quite simple.

On the other hand, the current yash-rs constructs a tree like this:

   PPID     PID    PGID     SID TTY        TPGID STAT   UID   TIME COMMAND
 516152  591807  591807  516152 pts/0     591807 S+    1000   0:00  \_ ./yash
 591807  592231  592231  516152 pts/0     591807 T     1000   0:00      \_ ./yash
 592231  592232  592231  516152 pts/0     591807 T     1000   0:00          \_ ./yash
 592232  592234  592231  516152 pts/0     591807 T     1000   0:00          |   \_ yes
 592231  592233  592231  516152 pts/0     591807 T     1000   0:00          \_ ./yash
 592233  592235  592231  516152 pts/0     591807 T     1000   0:00              \_ less

This is more complicated. The main job-controlling shell first spawns a child in a separate process group so that the child can be managed as a job. The child creates another two children to run the pipeline components. Each of the grandchild performs fork and exec to start the external utility.

In the both scenarios, the less utility turns on the alternate screen mode of the terminal so that it can go back to the original screen without losing the scroll-back history when the utility is finished or suspended. When less receives the SIGTSTP signal, it restores the terminal state, uninstalls the signal handler, and resends the signal to itself to actually suspend itself.

This scheme works fine for the simple process tree created by the C yash as the less utility is a direct child of the shell. A problem arises when yash-rs makes a more complicated tree where less is not a direct child of the shell. Remember that, when the user hits the susp key (typically Ctrl-Z), the SIGTSTP signal is sent to all processes in the foreground process group and that signal delivery is asynchronous among the processes. One process may receive the signal and react to it earlier than another. When the direct child of the topmost shell process receives the signal, it immediately suspends and the parent shell will proceed to the next command, but at this moment other processes in the process group may still be running, especially the less utility may still be interacting with the terminal in the alternate screen. This may cause unwanted side effects when the shell starts prompting for the next command line or executing the next job.

How can this issue be resolved?

  • Forked shell processes in the job process group block the SIGTSTP signal to prevent itself from being suspended by the signal. The shell processes monitor the state of child processes. When the last child suspends, the parent unblocks the signal to let itself suspend. This scheme makes sure the parent shell process suspends after the child does.
    • To make it possible to suspend a job-controlled subshell executing a built-in utility, the subshell should block SIGTSTP only while it is executing another child process.
  • Implementing the same optimization as the C yash would make the situation much less likely to happen, if not a perfect resolution.
  • When a foreground process stops, the shell can wait for 0.1 seconds or so to allow other process group members to clean up the terminal state before the shell proceeds to the next command or prompt.
  • Any other ideas?

magicant avatar Sep 10 '23 04:09 magicant