server icon indicating copy to clipboard operation
server copied to clipboard

MDEV-34129 mariadb-install-db appears to hang on macOS

Open DaveGosselin-MariaDB opened this issue 1 year ago • 1 comments
trafficstars

A bug in signal_handler thread initialization led to an overcomplicated implementation of wait_for_signal_handler_to_end, namely that we could fail to initialize the signal handler but mark it as active anyway. This meant that we could wait for it to terminate when it didn't exist in the first place.

Additionally, let's immediately close down the signal handler loop when we decide to break connections--it's the start of process termination anyway, and there's no need to wait once we've invoked break_connections.

DaveGosselin-MariaDB avatar May 13 '24 15:05 DaveGosselin-MariaDB

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 13 '24 15:05 CLAassistant

I don't understand the fix. What do you mean by "we could fail to initialize the signal handler but mark it as active anyway"? How could it have happened?

The signal handler will fail to initialize if my_thread_global_init_done is false, which is possible and expected in some of our tests (I don't recall which at the moment, but I think in certain plugins).

DaveGosselin-MariaDB avatar May 29 '24 20:05 DaveGosselin-MariaDB

I asked specifically because you did not remove that part of the test. It's line 95

Ahh... this was an oversight on my part, I removed that in the latest diff.

DaveGosselin-MariaDB avatar Jun 06 '24 20:06 DaveGosselin-MariaDB

Your commit essentially consists of three changes

  • remove the loop in wait_for_signal_thread_to_end()
  • handle the case when my_thread_init() fails
  • terminate immediately on SIGTERM

The first looks rather risky to me. The second — I don't really believe my_thread_init() can fail (and I tested it, mysqld hung all right, but my_thread_init() succeeded). And the third one — looks rather clear and safe, and — incidentally — it's the only change that is needed to fix the bug. With that hunk only the server doesn't hang for me anymore.

Let's only do the third part for now?

vuvova avatar Jun 10 '24 12:06 vuvova

Your commit essentially consists of three changes

Yes, this is true.

  • remove the loop in wait_for_signal_thread_to_end()
  • handle the case when my_thread_init() fails
  • terminate immediately on SIGTERM

The first looks rather risky to me.

If it seems risky, then can we improve the automated tests for this part of our system to reduce that risk?

The second — I don't really believe my_thread_init() can fail

If this is true, then should we change my_thread_init() to have a void return type?

the third one — looks rather clear and safe, and — incidentally — it's the only change that is needed to fix the bug. With that hunk only the server doesn't hang for me anymore. Let's only do the third part for now?

Yes, no problem. I will update the PR to include only that change. I retested locally both on macOS and FreeBSD and can see that it's sufficient.

DaveGosselin-MariaDB avatar Jun 10 '24 13:06 DaveGosselin-MariaDB