ppp chat.c deadlocks

chat.c deadlocks when following happens.

calls syslog()
syslog() takes mutex (syslog_lock)
While that mutex is held, signal handler is called
signal handlers in chat.c calls syslog()
syslog() takes mutex (syslog_lock)
deadlock

It seems pppd/main.c circumvent the same problem by not calling syslog(). https://github.com/ppp-project/ppp/commit/7bb5e2cdca9445e1c9ebf3a29efc80ef9d3cbed4

Oct 14 '21 08:10 anolivetree

Interesting. Do you have a way to trigger that scenario at will? It would help in testing a fix if you can supply such a method.

Oct 16 '21 03:10 paulusmack

I'm afraid I don't have a way to trigger the problem at will.

The moble dongle I use is huawei 604HW. https://consumer.huawei.com/jp/mobile-broadband/604hw/specs/

This one notifies the signal strength asynchronously. It seems that when the timeout of AT command and receiving signal strength notification happen at the same time, deadlock occurs.

send AT command
receives signal strength notification. try to syslog() the signal strength
alarm expires because of timeout of AT command. alarm handler calls syslog()
deadlock

Oct 16 '21 14:10 anolivetree

Calling any function that is not listed as async-signal-safe, i.e. see "man 7 signal-safety" on a Linux system should not be used from a signal handler. I've encountered number of permutations where weird things happens to the program and it's stuck waiting for a mutex down in libc. Could be syslog like this instance here, could be fwrite(), tzset() (via e.g. syslog), etc. Trying to do much beyond setting a flag, or using write(fd, ..) is kind of moot from a signal handler.

Oct 21 '21 19:10 enaess

@anolivetree could you try to execute the chat application with a -S (capital "S"), that should turn off the "to syslog" feature. You could specify "-s" -- lowercase "s" to write to stderr and redirect that to a file. However, it will invoke the fprintf() function which is still not going to fix it for you.

Reviewing the code, it seems like making it completely fail-proof from any async-signal-unsafe function is going to be difficult. falal() calls terminate(), and it may flush the terminal and use calls to fprintf() and terminal related functions.

Oct 21 '21 19:10 enaess

@enaess it seems that it doesn't deadlock with -S option. I've tried connecting periorically with and without -S option. With -S, chat didn't deadlock in one hour trial. Without -S, chat deadlocked two times in 30 minutes trial.

Nov 04 '21 05:11 anolivetree

Following is a workaround which blocks signals in msgf() and fatal(). I know it's not optimal but it circumvent the deadlock.

diff --git a/chat/chat.c b/chat/chat.c
index 710dba9..eff7097 100644
--- a/chat/chat.c
+++ b/chat/chat.c
@@ -492,6 +492,23 @@ Usage: %s [-e] [-E] [-v] [-V] [-t timeout] [-r report-file]\n\

 char line[1024];

+void block_catching_signals(sigset_t *oldset)
+{
+    sigset_t set;
+
+    sigemptyset(&set);
+    sigaddset(&set, SIGALRM);
+    sigaddset(&set, SIGINT);
+    sigaddset(&set, SIGTERM);
+    sigaddset(&set, SIGHUP);
+    sigprocmask(SIG_BLOCK, &set, oldset);
+}
+
+void unblock_catching_signals(const sigset_t *oldset)
+{
+    sigprocmask(SIG_SETMASK, oldset, NULL);
+}
+
 /*
  * Send a message to syslog and/or stderr.
  */
@@ -507,11 +524,16 @@ void msgf __V((const char *fmt, ...))
     fmt = va_arg(args, char *);
 #endif

+    sigset_t oldset;
+    block_catching_signals(&oldset);
+
     vfmtmsg(line, sizeof(line), fmt, args);
     if (to_log)
    syslog(LOG_INFO, "%s", line);
     if (to_stderr)
    fprintf(stderr, "%s\n", line);
+
+    unblock_catching_signals(&oldset);
 }

 /*
@@ -532,11 +554,17 @@ void fatal __V((int code, const char *fmt, ...))
     fmt = va_arg(args, char *);
 #endif

+    sigset_t oldset;
+    block_catching_signals(&oldset);
+
     vfmtmsg(line, sizeof(line), fmt, args);
     if (to_log)
    syslog(LOG_ERR, "%s", line);
     if (to_stderr)
    fprintf(stderr, "%s\n", line);
+
+    unblock_catching_signals(&oldset);
+
     terminate(code);
 }

Nov 04 '21 07:11 anolivetree

Your patch would prevent syslog from being called again even from a signal handler given that msgf and fatal are the two functions that actually calls syslog.

In the unlikely event that SIGALRM or SIGINT is triggered when chat is inside a call to fprintf() in libc from signal handler is still going to deadlock, right?
terminate() still uses functions that are not async-signal-safe
if someone commits a change to call syslog a different location of the code in the future, he/she lacks the context in which he also must block these signals.

I wonder if there would be worthwhile to set a global flag and process this in the loop of the main() or do_file() function to indicate it's time to bail. Continue to use these functions from within the signal handler is still going to mask a potential future problem even if it blocks call to syslog.

Note that I am not in a position to test this. I don't have a way to reproduce this problem. Though, I appreciate you taking the time to produce a patch for it.

Nov 04 '21 16:11 enaess

In the unlikely event that SIGALRM or SIGINT is triggered when chat is inside a call to fprintf() in libc from signal handler is still going to deadlock, right?

If don't know if fpritnf takes the same futex lock as syslog(), but if it does it should deadlock.

if someone commits a change to call syslog a different location of the code in the future, he/she lacks the context in which he also must block these signals.

True. That's why I didn't make a pull request because my patch should not be merged into official source code.

I wonder if there would be worthwhile to set a global flag and process this in the loop of the main() or do_file() function to indicate it's time to bail.

It would be a correct way to solve the problem.

I appreciate you taking the time to produce a patch for it.

I don't know if I can prepare a patch anytime soon. If you create a patch, please let me know. I'll test it.

Nov 05 '21 00:11 anolivetree

@anolivetree: Have you looked?

Mar 29 '22 00:03 Neustradamus

@Neustradamus I'm afraid not. I've been busy recently. I hope someone would make a fix for it.

Mar 29 '22 23:03 anolivetree

@anolivetree: Maybe you have looked since some months?

Oct 25 '22 17:10 Neustradamus

I pushed a commit on the 'chatsig' branch in this repository that should fix this problem. Review and/or testing would be welcome.

Nov 28 '22 05:11 paulusmack

I reviewed the 1 change in the branch and it looks good to me. Hopefully if @anolivetree is able to give it a swing to test it then we should be good.

Nov 28 '22 17:11 enaess

@paulusmack has done a PR here:

https://github.com/ppp-project/ppp/pull/375

What do you think?

Dec 06 '22 04:12 Neustradamus

I must have replied his email or something. Did review the change a while back and it looked good to me. Don't have a way to test it though, was hoping that the issue it closes has a collaborator who could help with that.

Dec 06 '22 05:12 enaess

ppp ppp copied to clipboard

chat.c deadlocks

ppp
ppp copied to clipboard