dvtm
dvtm copied to clipboard
Closing a shell causes window to hang on OS X 10.11
There's several ways to get to this frozen shell problem since upgrading to 10.11. I don't know enough about C to help, but I would take a wild guess that it relates to OS X's new SIPS.
- Quit a window with the key binding (e.g. in my case
C-d q
) - Type exit at a shell
- Open and then close the buffer mode pager thingy (e.g. in my case
C-d e
to open a scrollback buffer in less which I gather has something to do with copying and pasting but I just use it for scrolling)
This causes the window to basically freeze, In essence I can spawn windows but not close them.
On @martanne's advice I ran the following after intentionally causing the issue.
Which seemed to point to this line of code.
I've tried this with clean user configs (bashrc, profile etc), on two different machines and with clean installs of dvtm using the unmodified default config.
Hope someone can help because I've been using this thing daily for years now and I really miss it!
And in addition, this happens with at least 0.13, 0.14 and 0.15.
And this is what it looks like. The top window has frozen after printing 'Exit'.
@martanne is there anything I can do to help here? I'm holding off on submitting a pull request for the homebrew formula until it works incase it needs another patch, but I guess I can raise an issue there and see if anyone can lend a hand?
As I told you I'm not really familiar with Mac OS X and have no system at hand to debug it. Did anyone else encounter the same issue?
I guess for some reason the SIGCHLD
which is sent to dvtm when a child process terminates is lost.
What happens when you run some other process instead of a shell? I.e. start dvtm as in dvtm python
which should drop you into a python interpreter. Type CTRL-D
to exit python. Is dvtm stuck? Does the old window still exist? Can you create new windows?
Try compiling dvtm with make debug
and run it with dvtm 2> log
in another terminal run tail -f log
do you see any messages of the form "child with pid NNN died"?
Also what state are processes in after you type exit? Use something like pstree
to find the process id of the hanging process (obviously before typing exit). Check its status with ps PID
then type exit and do the same again.
Did anyone else encounter the same issue?
Same behavior here on OS X 10.11. I'll gather answers to your questions and report back.
What happens when you run some other process instead of a shell? Type CTRL-D to exit python. Is dvtm stuck?
No, if I don't attempt to open other windows. It waits for input before redrawing the terminal after exit but otherwise okay.
Does the old window still exist? Can you create new windows?
Can create new windows. Seems to hang as soon as I exit one of these. The other windows exist but similarly hang when attempting to exit.
do you see any messages of the form "child with pid NNN died"?
Infrequently a window will terminate successfully and do I see "child with pid NNN died" in the log. I do not see it when the window hangs, which is the overwhelming majority of the time.
Also what state are processes in after you type exit?
Before:
PID TT STAT TIME COMMAND
8588 s008 Ss+ 0:00.02 /bin/zsh
After:
PID TT STAT TIME COMMAND
8588 ?? Z 0:00.00 (zsh)
I've created a bounty for this issue! Please share it if you know anyone who can contribute to Mac compatibility :)
I decided to give dvtm a try and see if I could diagnose/fix the issue. There's more problems than just this on Mac OS X. It doesn't compile out of the box, giving me an error about SIGWINCH. So, first issue, can't compile without further investigation that I simply haven't done. Alright, well, let's just see if I can replicate the issue. So, I download the binary with homebrew, it installs fine, it runs. Issues immediately.
It doesn't seem to understand backspace. Instead, just printing a blank space (it's actually printing a space instead of the ^H character representing a backspace and failing to do properly erase the letter on the terminal itself, hitting return does indeed indicate that the backspaced corrections occured, so it's just a display error. This will be related to the default terminal setting that dvtm exports of dvtm-256color not being defined on Mac OS X anywhere. export TERM=screen-256color resolves this, but, is technically an incorrect solution.
When running dvtm from inside of iterm2, it exhibits the freezing issue mentioned in this issue every time I've tried it. There seems to be no intermitten portion to the issue when running it under those conditions. When I run it from 'terminal' it still exhibits the backspace issue, but the freezing no longer occurs and the terminal exits cleanly. None of the hotkeys appear to work to create or tile more windows after launching dvtm from either application.
Will look into it more later than just a passing try to compile and run after work, but, as just a quick glance, there's more issues to be had when attempting to run under Mac OS X.
Hope some of this info dump is helpful.
Thanks for looking into this @powder!
There's not much I can contribute to the main issue but I can help you to get it compiling and to set the right terminfo.
So, first issue, can't compile without further investigation that I simply haven't done.
You can manually apply the changes outlined in the [https://github.com/Homebrew/homebrew/blob/master/Library/Formula/dvtm.rb](homebrew formula) if you're building from scratch. I think the only patch needed is under def install
.
(On the topic of the formula, I'm not sure if it's best to contribute a patch to homebrew to resolve this or a pull request to @martanne. Hopefully he has time to advise.)
You can specify a $TERM with something like export DVTM_TERM='xterm-256color'
. I have this in my .bashrc, I find it works the best with most of the curses software I use in stock terminal.app.
@gridsystem Thanks! I don't know what I did wrong the first time, but since homebrew worked, that was one of the first things I looked at and it failed. I gave it another shot assuming that since someone else is telling me to head that direction I must have done something silly. Lo' and behold, I now have it compiling from source.
The "correct" way to solve the mismatch between $TERM
and the actual terminal capabilities is to install the dvtm terminfo description using tic -s dvtm.info
(or Mac OS X equivalent thereof) as is normally done during make install
@martanne The manual install works of the terminfo description works. It did not appear to successfully do that via homebrew, or when I did CFLAGS=-D_DARWIN_C_SOURCE
make "LIBS=-lc -lutil -lncurses" install`
When running via Terminal, the freeze occurs far less often, but it has occurred twice out of a few dozen times launching, playing with dvtm and exiting. The hotkeys are fixed by reassigning them, I chose ^b since I normally use that with tmux.
The behavior is fairly consistent within iterm2, I have yet to have it exit from iterm2 properly.
After spending more time than I should have during my work day, I've come to a verification of what was stated earlier. It will often miss, not receive, or Mac OS X simply does not send a SIGCHLD signal all of the time when you exit. When this happens, it just keeps looping in main endlessly until you kill the process, when you send TERM it happily terminates and goes on it's way. Hitting sigterm_handler, and then the proceeding function calls.
299984 sigterm_handler 299985 is_content_visible 299986 isarrange 299987 isvisible 299988 draw_content 299989 cleanup 299990 destroy 299991 focusnextnm 299992 nextvisible 299993 nextvisible 299994 isvisible 299995 focus 299996 detach 299997 detachstack 299998 nextvisible 299999 quit 300000 cleanup
Prior to those calls it's a very large number of iterations over draw_content, is_content_visible, isarrange, and isvisible. Which is what leads me to believe it's just simply looping through the 'while (running) { .... } in main.
A bit of googling has given me no good information on -why- it might not be sending or receiving the SIGCHLD signal, and unfortunately, I've not been able to capture a log.
This behavior for me occurs MOST often when I open dvtm, and then type exit at the shell prompt.
It happens fairly reliably on both iterm2 and Terminal after further testing. I am so far unable to reproduce the behavior is I use the MOD+x+x hotkey to quit dvtm.
I have not managed to catch a log of it exiting properly with the 'exit' command, but, from memory it is the same as the one from MOD+x+x, but, something about triggering the exit from the hotkey makes it work.
I've not got any sound technical theory on what's causing this behavior, and haven't had time to study it very intimately. Hopefully this information is a bit more concrete in the investigation towards what is going on.
Yes, as mentioned in my first response it is related to the handling of SIGCHLD
.
I can give you an overview on how it is supposed to work. Conceptually we want to block and be notified when either a signal occurs (SIGCHLD
a client process terminated, SIGWINCH
the terminal was resized) or I/O is available either from standard input (user keyboard input), a client process, the statusbar FIFO etc.
This is realized as follows:
- block SIGCHLD/SIGWINCH signals
- check whether a client has died, if so remove it
- call
pselect
with an empty signalmask which atomically- unblocks all signals, hence if a
SIGCHLD/SIGWINCH
was pending we will now receive it - perform the
select
, block, if a signal occurs select will return -1 witherrno
set toEINTR
(this is crucial, does it happen on Mac OS X?) - restore the signal mask i.e. once again block
SIGCHLD/SIGWINCH
- unblocks all signals, hence if a
- redraw if necessary
- goto 1
See this lwn.net article on why pselect is necessary. However the mentioned race condition should not cause an infinite hang. In the worst case we would block until some other event occurs. For example creating a new window should then remove the old one.
Another thing to investigate is whether the signal handler actually gets called, but for some reason the died
property of the client is not set.
When debugging these kind of issues it is extremely useful to get a trace at the syscall layer, as produced by strace(1)
on Linux.
Another option is to try to come up with a minimal reproducible test case. How does the following program behave? Run it, then use kill -WINCH <pid>
from another terminal, it should print a message related to the signal. Now press c<Enter>
this should fork a new process and print another message when it dies.
#define _DARWIN_C_SOURCE
#define _POSIX_C_SOURCE 200809L
#define _XOPEN_SOURCE 700
#define _XOPEN_SOURCE_EXTENDED
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <stdio.h>
#include <stdbool.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/select.h>
volatile sig_atomic_t died = false;
volatile sig_atomic_t resized = false;
void sigchld_handler(int sig) {
int status;
pid_t pid;
printf("SIGCHLD handler\n");
while ((pid = waitpid(-1, &status, WNOHANG)) != 0) {
if (pid == -1)
break;
printf("child with pid %d died\n", pid);
}
died = true;
}
void sigwinch_handler(int sig) {
printf("SIGWINCH handler\n");
resized = true;
}
int main(int argc, char *argv[]) {
struct sigaction sa;
memset(&sa, 0, sizeof sa);
sa.sa_flags = 0;
sigemptyset(&sa.sa_mask);
sa.sa_handler = sigchld_handler;
sigaction(SIGCHLD, &sa, NULL);
sa.sa_handler = sigwinch_handler;
sigaction(SIGWINCH, &sa, NULL);
sigset_t emptyset, blockset;
sigemptyset(&emptyset);
sigemptyset(&blockset);
sigaddset(&blockset, SIGWINCH);
sigaddset(&blockset, SIGCHLD);
sigprocmask(SIG_BLOCK, &blockset, NULL);
printf("parent pid: %d\n", getpid());
for (;;) {
int r, nfds = 0;
fd_set rd;
FD_ZERO(&rd);
FD_SET(STDIN_FILENO, &rd);
if (resized) {
printf("mainloop: need resize\n");
resized = false;
}
if (died) {
printf("mainloop: process died\n");
exit(EXIT_SUCCESS);
}
r = pselect(nfds + 1, &rd, NULL, NULL, NULL, &emptyset);
if (r == -1 && errno == EINTR)
continue;
if (r < 0) {
printf("select() error\n");
exit(EXIT_FAILURE);
}
if (FD_ISSET(STDIN_FILENO, &rd)) {
char key;
if (read(STDIN_FILENO, &key, 1) != 1) {
printf("read error\n");
continue;
}
if (key == 'c') {
pid_t pid = fork();
if (pid == -1) {
printf("fork failure\n");
exit(EXIT_FAILURE);
}
if (pid == 0) {
sleep(2);
exit(EXIT_SUCCESS);
}
printf("child with pid %d forked\n", pid);
} else if (key == 'q') {
exit(EXIT_SUCCESS);
} else if (key != '\n') {
printf("key: %c\n", key);
}
}
}
return 0;
}
From gcc dvtm-test.c -o dvtm-test
:
dvtm-test.c:44:15: error: use of undeclared identifier 'SIGWINCH'
sigaction(SIGWINCH, &sa, NULL);
^
dvtm-test.c:48:26: error: use of undeclared identifier 'SIGWINCH'
sigaddset(&blockset, SIGWINCH);
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/signal.h:122:52: note: expanded from macro 'sigaddset'
#define sigaddset(set, signo) (*(set) |= __sigbits(signo), 0)
^
2 errors generated.
You have to compile with _DARWIN_C_SOURCE
(or more specifically the other used flags in the homebrew package) either by adding -D_DARWIN_C_SOURCE
to your command line or adding an appropriate #define
to the top of the file.
Running ./dvtm-test
parent pid: 2022
After kill -WINCH 2022
SIGWINCH handler
mainloop: need resize
c
child with pid 2027 forked
SIGCHLD handler
child with pid 2027 died
mainloop: process died
While looking at this, I noticed that sigchld_handler
calls functions (at least vsprintf
) which are not async-signal safe, as described in sigaction(2)
. I changed the signal handler to just set a global flag that is checked in the main loop (like sigwinch_handler
works). I've pushed that to my darwin branch if people want to look at it.
Unfortunately, this doesn't resolve the issue. If I change the code to call check_deaths
unconditionally, the windows exit as expected, but that seems excessively bigger-hammer.
I've come up with a smaller program to reproduce the problem: https://gist.github.com/gmacon/e4ff879beaf9a2e3e607fc596dd16c91. It looks like passing some file descriptors to pselect
is required; when I called pselect(0, NULL, NULL, NULL, NULL, &sigmask_empty)
, it always received the SIGCHLD
. I'm not sure that this gets us any forwarder, however.
I've pushed an update to my darwin branch that replaces the use of pselect with the self-pipe trick. So far, this seems to be working for me.
Thanks for looking into this issue. Unfortunately I don't really have time right now, but I will eventually come back to it. For now a couple of comments.
- In dvtm the file descriptor set given to
pselect
is never empty. It always contains at leaststdin
. - Even though the non async-signal safe functions are only used in the debug path, moving the code out of the signal handler might be good idea anyway.
- I agree that using a self-pipe is a different approach to solve the problem. It might increase portability to some old Unix systems (do we care about those?). However it also needs some care to do properly. For example you don't check the return value of
pipe
, also the pipe should probably be configured for non-blocking I/O etc.
Out of interest I would like to track down the problem of the signal based approach. This is such basic functionality that it ought to work. We are likely missing something stupid. When I have a bit more time I will try to debug the issue with a minimal test case.
Hey! I tried to fix this bug but ultimately only manage to get some information and could use some insights.
On macOS closing a dvtm window result in a frozen UI (the closed window only) and the child process becoming a zombie. The frozen window still respond to dvtm bindings, MOD+q+q
is killing it as intended.
From what I gather dvtm never get the SIGCHLD so sigchld_handler()
is never called proof is the debug line from this function is absent from the log.
Here is when the fun begins, when I set a breakpoint (with lldb) at the beginning of the handler except the break at the point all works correctly. The only odd thing is that the signal value is 20 (SIGSTP
) instead of 17 (SIGCHLD
). With the same source code on Linux I get the correct signal value.
@martanne I use macOS at work and Linux at home, I really hope this can be fixed soon. I'm a little lost with signal handling (pun intended) but I'm willing to find a solution with a little help.
I spent some time this morning trying to understand why we end up in a SIGSTOP
to no avail.
However browsing the source of tmux and screen it appears both pass waitpid
options of WNOHANG
or WUNTRACED
. Indeed with the following diff I don't hit this issue.
diff --git a/dvtm.c b/dvtm.c
index 2b7ebdc..1248a59 100644
--- a/dvtm.c
+++ b/dvtm.c
@@ -702,7 +702,7 @@ sigchld_handler(int sig) {
int status;
pid_t pid;
- while ((pid = waitpid(-1, &status, WNOHANG)) != 0) {
+ while ((pid = waitpid(-1, &status, WNOHANG|WUNTRACED)) != 0) {
if (pid == -1) {
if (errno == ECHILD) {
/* no more child processes */
Edit: Reading the tmux and screen sources beyond a simple grep I now see they explicitly continue processes if they are stopped. Indeed with the above diff the shell is lost if stopped when signaled via kill for example, whereas in tmux or screen it is not. So I'll need to add a bit more logic than the overly simple diff above.
Example of where this was introduced in tmux https://github.com/tmux/tmux/commit/62d2ab3e687bfc7e0a02adedee30314b8ef1b08b
Adding |WUNTRACED
works for me as well 😺
I ran into this (or a similar?) problem trying to use dvtm on my macOS 10.14.5 system. I did some investigating, and I think I have made progress toward finding the root cause.
pselect()
only sets the signal mask if it blocks. If any of the file descriptors are ready for reading immediately, pselect()
returns right away, the signal mask is untouched, and any normally blocked signals are left pending.
When I exit a shell on my Linux machine, the read()
in vt_process()
returns -1
with errno
set to EIO
. On my Mac, it returns 0
to indicate EOF
. This means that vt_process()
also returns 0
and main()
never marks the client as dead. main()
doesn't destroy the client on the next iteration, and its file descriptor is added to the set for the next pselect()
.
Because the file descriptor is still in the fd_set
and read()
wouldn't block on EOF
, the pselect()
returns immediately, the loop runs forever, and SIGCHLD
is left unhandled indefinitely.
I'm not sure if this is the right place to fix this (maybe this difference is related to pty setup?), but I made some small changes to vt_process()
and main()
so that the client is marked dead when it gets to EOF
. Everything seems to be working correctly on both my machines. You can see my changes here: https://github.com/kendallschmit/dvtm/commit/233244fbe854f86ce05d4c697855aca35309cccc
The "correct" way to solve the mismatch between $TERM and the actual terminal capabilities is to install the dvtm terminfo description using tic -s dvtm.info (or Mac OS X equivalent thereof) as is normally done during make install
For future reference, this works on macos 11.4. The command is the same:
martanne/dvtm $ tic -s dvtm.info
2 entries written to /Users/stephan/.terminfo