dvtm icon indicating copy to clipboard operation
dvtm copied to clipboard

Closing a shell causes window to hang on OS X 10.11

Open gridsystem opened this issue 8 years ago • 26 comments

There's several ways to get to this frozen shell problem since upgrading to 10.11. I don't know enough about C to help, but I would take a wild guess that it relates to OS X's new SIPS.

  • Quit a window with the key binding (e.g. in my case C-d q)
  • Type exit at a shell
  • Open and then close the buffer mode pager thingy (e.g. in my case C-d e to open a scrollback buffer in less which I gather has something to do with copying and pasting but I just use it for scrolling)

This causes the window to basically freeze, In essence I can spawn windows but not close them.

On @martanne's advice I ran the following after intentionally causing the issue.

screen shot 2016-02-18 at 12 31 57

Which seemed to point to this line of code.

screen shot 2016-02-18 at 12 31 29

I've tried this with clean user configs (bashrc, profile etc), on two different machines and with clean installs of dvtm using the unmodified default config.

Hope someone can help because I've been using this thing daily for years now and I really miss it!

gridsystem avatar Feb 22 '16 15:02 gridsystem

And in addition, this happens with at least 0.13, 0.14 and 0.15.

gridsystem avatar Feb 22 '16 15:02 gridsystem

And this is what it looks like. The top window has frozen after printing 'Exit'.

screen shot 2016-02-22 at 15 21 20

gridsystem avatar Feb 22 '16 15:02 gridsystem

@martanne is there anything I can do to help here? I'm holding off on submitting a pull request for the homebrew formula until it works incase it needs another patch, but I guess I can raise an issue there and see if anyone can lend a hand?

gridsystem avatar Feb 23 '16 20:02 gridsystem

As I told you I'm not really familiar with Mac OS X and have no system at hand to debug it. Did anyone else encounter the same issue?

I guess for some reason the SIGCHLD which is sent to dvtm when a child process terminates is lost.

What happens when you run some other process instead of a shell? I.e. start dvtm as in dvtm python which should drop you into a python interpreter. Type CTRL-D to exit python. Is dvtm stuck? Does the old window still exist? Can you create new windows?

Try compiling dvtm with make debug and run it with dvtm 2> log in another terminal run tail -f log do you see any messages of the form "child with pid NNN died"?

Also what state are processes in after you type exit? Use something like pstree to find the process id of the hanging process (obviously before typing exit). Check its status with ps PID then type exit and do the same again.

martanne avatar Feb 28 '16 12:02 martanne

Did anyone else encounter the same issue?

Same behavior here on OS X 10.11. I'll gather answers to your questions and report back.

hakujin avatar Feb 28 '16 23:02 hakujin

What happens when you run some other process instead of a shell? Type CTRL-D to exit python. Is dvtm stuck?

No, if I don't attempt to open other windows. It waits for input before redrawing the terminal after exit but otherwise okay.

Does the old window still exist? Can you create new windows?

Can create new windows. Seems to hang as soon as I exit one of these. The other windows exist but similarly hang when attempting to exit.

do you see any messages of the form "child with pid NNN died"?

Infrequently a window will terminate successfully and do I see "child with pid NNN died" in the log. I do not see it when the window hangs, which is the overwhelming majority of the time.

Also what state are processes in after you type exit?

Before:

PID   TT  STAT      TIME COMMAND
8588 s008  Ss+    0:00.02 /bin/zsh

After:

PID   TT  STAT      TIME COMMAND
8588   ??  Z      0:00.00 (zsh)

hakujin avatar Mar 01 '16 00:03 hakujin

I've created a bounty for this issue! Please share it if you know anyone who can contribute to Mac compatibility :)

gridsystem avatar Mar 04 '16 17:03 gridsystem

I decided to give dvtm a try and see if I could diagnose/fix the issue. There's more problems than just this on Mac OS X. It doesn't compile out of the box, giving me an error about SIGWINCH. So, first issue, can't compile without further investigation that I simply haven't done. Alright, well, let's just see if I can replicate the issue. So, I download the binary with homebrew, it installs fine, it runs. Issues immediately.

It doesn't seem to understand backspace. Instead, just printing a blank space (it's actually printing a space instead of the ^H character representing a backspace and failing to do properly erase the letter on the terminal itself, hitting return does indeed indicate that the backspaced corrections occured, so it's just a display error. This will be related to the default terminal setting that dvtm exports of dvtm-256color not being defined on Mac OS X anywhere. export TERM=screen-256color resolves this, but, is technically an incorrect solution.

When running dvtm from inside of iterm2, it exhibits the freezing issue mentioned in this issue every time I've tried it. There seems to be no intermitten portion to the issue when running it under those conditions. When I run it from 'terminal' it still exhibits the backspace issue, but the freezing no longer occurs and the terminal exits cleanly. None of the hotkeys appear to work to create or tile more windows after launching dvtm from either application.

Will look into it more later than just a passing try to compile and run after work, but, as just a quick glance, there's more issues to be had when attempting to run under Mac OS X.

Hope some of this info dump is helpful.

powder avatar Mar 22 '16 15:03 powder

Thanks for looking into this @powder!

There's not much I can contribute to the main issue but I can help you to get it compiling and to set the right terminfo.

So, first issue, can't compile without further investigation that I simply haven't done.

You can manually apply the changes outlined in the [https://github.com/Homebrew/homebrew/blob/master/Library/Formula/dvtm.rb](homebrew formula) if you're building from scratch. I think the only patch needed is under def install.

(On the topic of the formula, I'm not sure if it's best to contribute a patch to homebrew to resolve this or a pull request to @martanne. Hopefully he has time to advise.)

You can specify a $TERM with something like export DVTM_TERM='xterm-256color'. I have this in my .bashrc, I find it works the best with most of the curses software I use in stock terminal.app.

gridsystem avatar Mar 22 '16 15:03 gridsystem

@gridsystem Thanks! I don't know what I did wrong the first time, but since homebrew worked, that was one of the first things I looked at and it failed. I gave it another shot assuming that since someone else is telling me to head that direction I must have done something silly. Lo' and behold, I now have it compiling from source.

powder avatar Mar 22 '16 16:03 powder

The "correct" way to solve the mismatch between $TERM and the actual terminal capabilities is to install the dvtm terminfo description using tic -s dvtm.info (or Mac OS X equivalent thereof) as is normally done during make install

martanne avatar Mar 22 '16 16:03 martanne

@martanne The manual install works of the terminfo description works. It did not appear to successfully do that via homebrew, or when I did CFLAGS=-D_DARWIN_C_SOURCE make "LIBS=-lc -lutil -lncurses" install`

When running via Terminal, the freeze occurs far less often, but it has occurred twice out of a few dozen times launching, playing with dvtm and exiting. The hotkeys are fixed by reassigning them, I chose ^b since I normally use that with tmux.

The behavior is fairly consistent within iterm2, I have yet to have it exit from iterm2 properly.

powder avatar Mar 22 '16 16:03 powder

After spending more time than I should have during my work day, I've come to a verification of what was stated earlier. It will often miss, not receive, or Mac OS X simply does not send a SIGCHLD signal all of the time when you exit. When this happens, it just keeps looping in main endlessly until you kill the process, when you send TERM it happily terminates and goes on it's way. Hitting sigterm_handler, and then the proceeding function calls.

299984 sigterm_handler 299985 is_content_visible 299986 isarrange 299987 isvisible 299988 draw_content 299989 cleanup 299990 destroy 299991 focusnextnm 299992 nextvisible 299993 nextvisible 299994 isvisible 299995 focus 299996 detach 299997 detachstack 299998 nextvisible 299999 quit 300000 cleanup

Prior to those calls it's a very large number of iterations over draw_content, is_content_visible, isarrange, and isvisible. Which is what leads me to believe it's just simply looping through the 'while (running) { .... } in main.

A bit of googling has given me no good information on -why- it might not be sending or receiving the SIGCHLD signal, and unfortunately, I've not been able to capture a log.

This behavior for me occurs MOST often when I open dvtm, and then type exit at the shell prompt.

It happens fairly reliably on both iterm2 and Terminal after further testing. I am so far unable to reproduce the behavior is I use the MOD+x+x hotkey to quit dvtm.

I have not managed to catch a log of it exiting properly with the 'exit' command, but, from memory it is the same as the one from MOD+x+x, but, something about triggering the exit from the hotkey makes it work.

I've not got any sound technical theory on what's causing this behavior, and haven't had time to study it very intimately. Hopefully this information is a bit more concrete in the investigation towards what is going on.

powder avatar Mar 22 '16 18:03 powder

Yes, as mentioned in my first response it is related to the handling of SIGCHLD.

I can give you an overview on how it is supposed to work. Conceptually we want to block and be notified when either a signal occurs (SIGCHLD a client process terminated, SIGWINCH the terminal was resized) or I/O is available either from standard input (user keyboard input), a client process, the statusbar FIFO etc.

This is realized as follows:

  1. block SIGCHLD/SIGWINCH signals
  2. check whether a client has died, if so remove it
  3. call pselect with an empty signalmask which atomically
    • unblocks all signals, hence if a SIGCHLD/SIGWINCH was pending we will now receive it
    • perform the select, block, if a signal occurs select will return -1 with errno set to EINTR (this is crucial, does it happen on Mac OS X?)
    • restore the signal mask i.e. once again block SIGCHLD/SIGWINCH
  4. redraw if necessary
  5. goto 1

See this lwn.net article on why pselect is necessary. However the mentioned race condition should not cause an infinite hang. In the worst case we would block until some other event occurs. For example creating a new window should then remove the old one.

Another thing to investigate is whether the signal handler actually gets called, but for some reason the died property of the client is not set.

When debugging these kind of issues it is extremely useful to get a trace at the syscall layer, as produced by strace(1) on Linux.

Another option is to try to come up with a minimal reproducible test case. How does the following program behave? Run it, then use kill -WINCH <pid> from another terminal, it should print a message related to the signal. Now press c<Enter> this should fork a new process and print another message when it dies.

#define _DARWIN_C_SOURCE
#define _POSIX_C_SOURCE 200809L
#define _XOPEN_SOURCE 700
#define _XOPEN_SOURCE_EXTENDED

#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <stdio.h>
#include <stdbool.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/select.h>

volatile sig_atomic_t died = false;
volatile sig_atomic_t resized = false;

void sigchld_handler(int sig) {
    int status;
    pid_t pid;
    printf("SIGCHLD handler\n");
    while ((pid = waitpid(-1, &status, WNOHANG)) != 0) {
        if (pid == -1)
            break;
        printf("child with pid %d died\n", pid);
    }
    died = true;
}

void sigwinch_handler(int sig) {
    printf("SIGWINCH handler\n");
    resized = true;
}

int main(int argc, char *argv[]) {
    struct sigaction sa;
    memset(&sa, 0, sizeof sa);
    sa.sa_flags = 0;
    sigemptyset(&sa.sa_mask);
    sa.sa_handler = sigchld_handler;
    sigaction(SIGCHLD, &sa, NULL);
    sa.sa_handler = sigwinch_handler;
    sigaction(SIGWINCH, &sa, NULL);
    sigset_t emptyset, blockset;
    sigemptyset(&emptyset);
    sigemptyset(&blockset);
    sigaddset(&blockset, SIGWINCH);
    sigaddset(&blockset, SIGCHLD);
    sigprocmask(SIG_BLOCK, &blockset, NULL);

    printf("parent pid: %d\n", getpid());

    for (;;) {
        int r, nfds = 0;
        fd_set rd;
        FD_ZERO(&rd);
                FD_SET(STDIN_FILENO, &rd);

        if (resized) {
            printf("mainloop: need resize\n");
            resized = false;
        }

        if (died) {
            printf("mainloop: process died\n");
            exit(EXIT_SUCCESS);
        }

        r = pselect(nfds + 1, &rd, NULL, NULL, NULL, &emptyset);

        if (r == -1 && errno == EINTR)
            continue;

        if (r < 0) {
            printf("select() error\n");
            exit(EXIT_FAILURE);
        }

        if (FD_ISSET(STDIN_FILENO, &rd)) {
            char key;
            if (read(STDIN_FILENO, &key, 1) != 1) {
                printf("read error\n");
                continue;
            }
            if (key == 'c') {
                pid_t pid = fork();
                if (pid == -1) {
                    printf("fork failure\n");
                    exit(EXIT_FAILURE);
                }
                if (pid == 0) {
                    sleep(2);
                    exit(EXIT_SUCCESS);
                }
                printf("child with pid %d forked\n", pid);
            } else if (key == 'q') {
                exit(EXIT_SUCCESS);
            } else if (key != '\n') {
                printf("key: %c\n", key);
            }
        }
    }

    return 0;
}

martanne avatar Mar 22 '16 21:03 martanne

From gcc dvtm-test.c -o dvtm-test:

dvtm-test.c:44:15: error: use of undeclared identifier 'SIGWINCH'
    sigaction(SIGWINCH, &sa, NULL);
              ^
dvtm-test.c:48:26: error: use of undeclared identifier 'SIGWINCH'
    sigaddset(&blockset, SIGWINCH);
                         ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/signal.h:122:52: note: expanded from macro 'sigaddset'
#define sigaddset(set, signo)   (*(set) |= __sigbits(signo), 0)
                                                     ^
2 errors generated.

gridsystem avatar Mar 29 '16 09:03 gridsystem

You have to compile with _DARWIN_C_SOURCE (or more specifically the other used flags in the homebrew package) either by adding -D_DARWIN_C_SOURCE to your command line or adding an appropriate #define to the top of the file.

martanne avatar Mar 30 '16 12:03 martanne

Running ./dvtm-test

parent pid: 2022

After kill -WINCH 2022

SIGWINCH handler
mainloop: need resize
c
child with pid 2027 forked
SIGCHLD handler
child with pid 2027 died
mainloop: process died

gridsystem avatar Mar 31 '16 13:03 gridsystem

While looking at this, I noticed that sigchld_handler calls functions (at least vsprintf) which are not async-signal safe, as described in sigaction(2). I changed the signal handler to just set a global flag that is checked in the main loop (like sigwinch_handler works). I've pushed that to my darwin branch if people want to look at it.

Unfortunately, this doesn't resolve the issue. If I change the code to call check_deaths unconditionally, the windows exit as expected, but that seems excessively bigger-hammer.

gmacon avatar Jun 18 '16 20:06 gmacon

I've come up with a smaller program to reproduce the problem: https://gist.github.com/gmacon/e4ff879beaf9a2e3e607fc596dd16c91. It looks like passing some file descriptors to pselect is required; when I called pselect(0, NULL, NULL, NULL, NULL, &sigmask_empty), it always received the SIGCHLD. I'm not sure that this gets us any forwarder, however.

gmacon avatar Aug 03 '16 20:08 gmacon

I've pushed an update to my darwin branch that replaces the use of pselect with the self-pipe trick. So far, this seems to be working for me.

gmacon avatar Aug 04 '16 14:08 gmacon

Thanks for looking into this issue. Unfortunately I don't really have time right now, but I will eventually come back to it. For now a couple of comments.

  1. In dvtm the file descriptor set given to pselect is never empty. It always contains at least stdin.
  2. Even though the non async-signal safe functions are only used in the debug path, moving the code out of the signal handler might be good idea anyway.
  3. I agree that using a self-pipe is a different approach to solve the problem. It might increase portability to some old Unix systems (do we care about those?). However it also needs some care to do properly. For example you don't check the return value of pipe, also the pipe should probably be configured for non-blocking I/O etc.

Out of interest I would like to track down the problem of the signal based approach. This is such basic functionality that it ought to work. We are likely missing something stupid. When I have a bit more time I will try to debug the issue with a minimal test case.

martanne avatar Aug 06 '16 11:08 martanne

Hey! I tried to fix this bug but ultimately only manage to get some information and could use some insights.

On macOS closing a dvtm window result in a frozen UI (the closed window only) and the child process becoming a zombie. The frozen window still respond to dvtm bindings, MOD+q+q is killing it as intended. From what I gather dvtm never get the SIGCHLD so sigchld_handler() is never called proof is the debug line from this function is absent from the log.

Here is when the fun begins, when I set a breakpoint (with lldb) at the beginning of the handler except the break at the point all works correctly. The only odd thing is that the signal value is 20 (SIGSTP) instead of 17 (SIGCHLD). With the same source code on Linux I get the correct signal value.

@martanne I use macOS at work and Linux at home, I really hope this can be fixed soon. I'm a little lost with signal handling (pun intended) but I'm willing to find a solution with a little help.

casimir avatar Mar 06 '17 22:03 casimir

I spent some time this morning trying to understand why we end up in a SIGSTOP to no avail.

However browsing the source of tmux and screen it appears both pass waitpid options of WNOHANG or WUNTRACED. Indeed with the following diff I don't hit this issue.

diff --git a/dvtm.c b/dvtm.c
index 2b7ebdc..1248a59 100644
--- a/dvtm.c
+++ b/dvtm.c
@@ -702,7 +702,7 @@ sigchld_handler(int sig) {
        int status;
        pid_t pid;
 
-       while ((pid = waitpid(-1, &status, WNOHANG)) != 0) {
+       while ((pid = waitpid(-1, &status, WNOHANG|WUNTRACED)) != 0) {
                if (pid == -1) {
                        if (errno == ECHILD) {
                                /* no more child processes */

Edit: Reading the tmux and screen sources beyond a simple grep I now see they explicitly continue processes if they are stopped. Indeed with the above diff the shell is lost if stopped when signaled via kill for example, whereas in tmux or screen it is not. So I'll need to add a bit more logic than the overly simple diff above.

Example of where this was introduced in tmux https://github.com/tmux/tmux/commit/62d2ab3e687bfc7e0a02adedee30314b8ef1b08b

g0xA52A2A avatar Oct 29 '18 15:10 g0xA52A2A

Adding |WUNTRACED works for me as well 😺

monokrome avatar Nov 27 '18 01:11 monokrome

I ran into this (or a similar?) problem trying to use dvtm on my macOS 10.14.5 system. I did some investigating, and I think I have made progress toward finding the root cause.

pselect() only sets the signal mask if it blocks. If any of the file descriptors are ready for reading immediately, pselect() returns right away, the signal mask is untouched, and any normally blocked signals are left pending.

When I exit a shell on my Linux machine, the read() in vt_process() returns -1 with errno set to EIO. On my Mac, it returns 0 to indicate EOF. This means that vt_process() also returns 0 and main() never marks the client as dead. main() doesn't destroy the client on the next iteration, and its file descriptor is added to the set for the next pselect().

Because the file descriptor is still in the fd_set and read() wouldn't block on EOF, the pselect() returns immediately, the loop runs forever, and SIGCHLD is left unhandled indefinitely.

I'm not sure if this is the right place to fix this (maybe this difference is related to pty setup?), but I made some small changes to vt_process() and main() so that the client is marked dead when it gets to EOF. Everything seems to be working correctly on both my machines. You can see my changes here: https://github.com/kendallschmit/dvtm/commit/233244fbe854f86ce05d4c697855aca35309cccc

kendallschmit avatar Jun 30 '19 02:06 kendallschmit

The "correct" way to solve the mismatch between $TERM and the actual terminal capabilities is to install the dvtm terminfo description using tic -s dvtm.info (or Mac OS X equivalent thereof) as is normally done during make install

For future reference, this works on macos 11.4. The command is the same:

martanne/dvtm $ tic -s dvtm.info
2 entries written to /Users/stephan/.terminfo

srenatus avatar Jul 04 '21 12:07 srenatus