rust
rust copied to clipboard
`std::process::exit` is not thread-safe
The current implementation of std::process::exit is unsound on Linux and possibly related platforms where it defers to libc's exit() function. exit() is documented to be not thread-safe, and hence std::process::exit is not thread-safe either, despite being a non-unsafe function.
To show that this isn't just a theoretical problem, here is a minimal example that segfaults on my machine (Ubuntu with glibc 2.37):
use std::thread;
fn main() {
for _ in 0..32 {
unsafe { libc::atexit(exit_handler) };
}
for _ in 0..2 {
thread::spawn(|| std::process::exit(0));
}
}
extern "C" fn exit_handler() {
thread::sleep_ms(1000);
}
The example contains unsafe code, but only to install exit handlers. AFAICT nothing about the libc::atexit call is unsafe. The UB is introduced by calling std::process::exit concurrently afterwards.
If you are curious, https://github.com/MaterializeInc/materialize/issues/21751#issuecomment-2172783721 lays out what causes the segfault (it's a use-after-free). That's not terribly relevant though, given that glibc has no obligation to ensure exit() is thread safe when it's clearly documented not to be. Instead, Rust should either mark std::process::exit as unsafe (which is something it could do for the 2024 edition), or introduce locking to ensure only a single thread gets to call exit() at a time.
This has been discussed previously in https://github.com/rust-lang/rust/issues/83994. There the decision was to not do anything about the concerns with std::process::exit, mostly based on the reasoning that exit() is thread safe. The above example shows it's clearly not, neither according to the documentation nor in the implementation.
Given that I just spend the better part of a weekend debugging a segfault that was caused by concurrent std::process::exit invocations I feel quite strongly that this issue deserves reconsideration.
On that thread joshtriplett notes:
The same arguments apply here as they do with environment handling: some types of unsynchronized changes to the environment from C code could race with an otherwise-synchronized one from Rust on some platforms, but we don't mark the corresponding functions in std::env as unsafe.
We do now mark std::env::set_var as unsafe so maybe thinking on this has changed. Though I'm not sure how you would mark returning from main as being unsafe?
Linux libc documentation does claim that exit() is thread-unsafe: "MT-Unsafe race:exit", although the original thread suggests that all libc implementations introduce appropriate locks. Perhaps introducing a Once before calling libc::exit is sufficient and not a performance blocker?
Doesn't solve the problem with exit getting called from non-rust code. And no, an atexit handler won't help since it leaves a race-window and if you're in a situation where threads concurrently call exit you're already racing.
https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised.
I'd guess the safe thing to do is kill all other threads then call exit.
Not sure if serious...
Killing threads would make anything that accesses their stacks UB. Freezing them would work but that's difficult to implement reliably. Libc would be in a position to do it since they control pthreads. But then they might as well fix their locking.
Adding 32 atexit handlers seems to be essential to make this segfault on my glibc 2.39+r52+gf8e4623421-1 on Arch Linux. With 31 atexit handlers, it doesn't segfault.
I'd say it makes sense to introduce a lock on the Rust side, since pure-Rust code shouldn't be able to cause UB by calling libc's exit on different threads.
This also means that Rust cdylibs mustn't call std::process::exit since they might not be the only language running, and applications must ensure that no foreign functions call exit. The latter should be a bug anyway due to the C standard saying that multiple calls to exit are UB.
https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised.
Based on how the glibc code is set up it very much looks like exit isn't intended to be thread safe. It uses a lock, but releases it every time it calls one of the exit handlers. That'd make no sense if it wanted to prevent other threads from modifying the list of exit handlers concurrently. The lock is useful for preventing corruption in the face of other threads calling atexit in parallel. There is a comment that says that this is its intended purpose (though admittedly it could be clearer).
I think the previous assumption that glibc's exit is thread-safe was a misconception based on the existence of this lock alone.
Adding 32 atexit handlers seems to be essential to make this segfault on my glibc 2.39+r52+gf8e4623421-1 on Arch Linux. With 31 atexit handlers, it doesn't segfault.
That's explained by the fact that one block in the list of exit handlers contains 32 entries, and the last block is never freed. So to trigger the use-after-free you need at least two blocks in the list. glibc installs one exit handler on its own, so 32 more need to be installed to add a free-able block to the list.
Triage: marking this as I-unsound, but please readjust the label accordingly.
This comment from the main author of musl was pointed out in the libs-api meeting:
POSIX requires exit be thread-safe; it is not one of the functions listed in the exceptions, and all functions are required to be thread-safe by default. My understanding is that C11 also requires this.
I looked up the spec to confirm and it seems to be correct. https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01
All functions defined by this volume of POSIX.1-2017 shall be thread-safe, except that the following functions1 need not be thread-safe.
exit is not on the list.
According to the C11 standard, calling exit more than once is undefined behavior.
C11 7.22.4.4p2
If a program calls the
exitfunction more than once, or calls thequick_exitfunction in addition to theexitfunction, the behavior is undefined.
Calling it from two threads at the same time is calling it twice. It is thus explicitly thread-unsafe in C11.
The POSIX description also contains that sentence.
https://pubs.opengroup.org/onlinepubs/009695299/functions/exit.html
If
exit()is called more than once, the behavior is undefined.
So the POSIX spec is kind of in conflict with itself?
@tbu- See the latter half of https://github.com/rust-lang/rust/issues/83994#issuecomment-1430013574
If there's any proper action to be had here, it's on the C side. The text about calling
exitmore than once being UB was written in the absence of threads and was obviously intended to make recursiveexit(viaatexithandlers) undefined, not to be a constraint on MT programs outside the scope of C. This should just be fixed (at least in POSIX and other more specific implementation specs) to make it so only the recursive case is undefined.
It makes sense to fix this on the C side, too.
Currently, we have process::exit causing practical UB on a tier 1 platform, it's not just a hypothetical issue, so I think something like #126606 also makes sense. Basically making sure exit() is only called once, and other threads calling exit simply going to sleep forever (until they're cleaned up by the OS).
This should probably be reported as a bug to glibc, along with a C program that reproduces the issue.
Here is a mostly-one-to-one C11 translation of the program in the OP that reproduces a segfault on my machine (Ubuntu 24.04, gcc 13.2.0-23ubuntu4, glibc 2.39-0ubuntu8.2), if someone wants to use it in a bug report.
code
#include <stdlib.h>
#include <threads.h>
void exit_handler(void) {
thrd_sleep(&(struct timespec){.tv_sec = 1}, NULL);
}
int thread_function(void *arg) {
(void)arg;
exit(0);
}
int main(void) {
for (int i = 0; i < 32; ++i) {
atexit(exit_handler);
}
for (int i = 0; i < 2; ++i) {
thrd_t thread;
thrd_create(&thread, thread_function, NULL);
}
}
I'd say it makes sense to introduce a lock on the Rust side, since pure-Rust code shouldn't be able to cause UB by calling libc's
exiton different threads.
This is definitely the wrong way to go, we just finished the fight with set_var, let's not repeat it.
This is definitely the wrong way to go, we just finished the fight with
set_var, let's not repeat it.
I think this case is different.
Process exiting is something that should fundamentally be thread-safe. I can even see this being adopted into the C standard. Libraries are not supposed to exit (but they are expected to getenv).
The problem with safe setenv was that libraries were expected to getenv without taking a lock.
This should probably be reported as a bug to glibc, along with a C program that reproduces the issue.
I don't think it's exactly a bug in glibc, though, and reporting it that way might get the issue bounced. The Linux manual page for exit(3) says clearly:
The exit() function uses a global variable that is not protected, so it is not thread-safe.
So glibc is at least conforming to its own documentation.
As @richfelker said in https://github.com/rust-lang/rust/issues/83994#issuecomment-1430013574, I do think it's worth trying to get this fixed over in the POSIX/C world, but I think that looks like engaging the folks who work on the POSIX and C standards to get this updated in the next version of those standards, and looping the glibc maintainers into those conversations for feedback.
This is definitely the wrong way to go, we just finished the fight with
set_var, let's not repeat it.I think this case is different.
Process exiting is something that should fundamentally be thread-safe. I can even see this being adopted into the C standard. Libraries are not supposed to
exit(but they are expected togetenv).The problem with safe
setenvwas that libraries were expected togetenvwithout taking a lock.
I agree. Adding a lock around std::process::exit seems like a no downside change. As far as I can tell, doing so entirely eliminates the unsoundness. If it's later determined that there's an error in that reasoning and std::process::exit must be marked unsafe, the existence of the lock doesn't make that transition any more difficult. And the happy case is that the POSIX and C folks agree that exit should be safe to call from multiple threads simultaneously, and Rust can remove the lock for platforms with exit implementations that are known to follow the new spec.
I agree that adding a lock is fine and would solve this issue. I am slightly concerned about the possibility of deadlocks but I think it shouldn't cause any.
😕 How does it fix the issue? Doesn't it suffer from the same problem as the environment lock, that C code can still race?
C code that follows the rules wouldn't call exit in cases where Rust could call it anyway. Only if you have multiple Rust staticlibs/cdylibs which don't share the lock can you still get issues. This is unlike the setenv issue where C code that follows the rules (doesn't call setenv once any thread is spawned) is enough to cause UB when calling setenv from the Rust side. And the setenv issue is fundamentally impossible to make safe due to the guarantees POSIX provides around getenv and the environ global, while exit is trivially to make safe by having libc add a lock.
I don't expect C code to follow the rules because following the rules isn't reasonable here. Linking python (or java?) and rust code was an example that was brought up in environ discussion and this applies here too.
So imo it's at best a partial bandaid, not a fix.
It's a fix for the Rust side. libc and other libraries don't call exit in random places (unlike getenv).
It's partial bandaid for the general problem, but theoretically, libraries shouldn't call exit anyway (again, unlike getenv). I feel like the situation looks much better than the env lock.
So imo it's at best a partial bandaid, not a fix.
It is a full fix for this issue. The issue here is that a well behaved Rust program can cause memory unsafety by calling the std::process::exit function from multiple threads. This means that std::process::exit function is currently unsound. It should not be possible to cause memory unsafety in Rust by calling a safe function.
Adding a lock inside of std::process::exit fully fixes this issue, as it will no longer be possible for a well behaved Rust program to cause memory unsafety by calling std::process::exit.
The issue is not that C programs can cause memory unsafety by calling the libc exit function from multiple threads. POSIX is clear: calling libc's exit function twice is undefined behavior. Libraries that call libc's exit function must only do so if they have arranged to ensure that they are the only thread that will call exit. How C libraries (or libraries in other languages that call libc's exit function) arrange to do that is out of scope for Rust.
I don't expect C code to follow the rules because following the rules isn't reasonable here.
I think it's the other way around: the only reasonable approach to writing safe C code is to mandate that C code follows the rules. Safe C code must not trigger undefined behavior. Any C library that wants to call exit safely needs to somehow ensure that no other thread could possibly cause exit to be called after the library calls exit. For example, the library could declare itself to be thread-unsafe, or it could clearly document the caller's responsibilities on a function that calls exit, like so:
/** Terminates the current process with exit code 0.
*
* Calling this function concurrently with another thread that calls `exit`
* will trigger undefined behavior. If you call this function, you MUST ensure
* that no other threads can possibly call `exit` after this function is called.
*/
void mylib_terminate(void) {
printf("mylib requested to exit process\n");
exit(0);
}
It sounds like there is some appetite for adjusting the C standard to allow calling exit more than once, but until that happens, I don't think there is a Rust issue here. If C code needs to call exit, it must somehow arrange to ensure that exit is not called more than once, but how that happens is out of scope for Rust.
I'll repeat @RalfJung's argument https://github.com/rust-lang/rust/issues/83994#issuecomment-1430358346
This isn't about just C having UB. It's about libc/posix (not C the language) being the foundation on which most other languages build. By necessity, on platforms where raw syscalls are not available. Safe languages still expose means to exit the process that end up calling C's exit(). One of Rust's aims is to be able to interoperate with such languages.
We cannot claim that it's merely some sketchy C programs that are holding it wrong. We are building on a foundation of sand. Libraries that claim to provide safe interop with safe languages are exposing UB.
process::exit does not even have a correctness requirement that it should be called from bin crates and not from libraries. Neither do python's or java's exit(). And of course zero safety requirements, since it's a safe method.
It sounds like there is some appetite for adjusting the C standard to allow calling exit more than once, but until that happens, I don't think there is a Rust issue here.
We can't promise UB-free language interop as long as the underlying issue is not fixed.
@the8472 You haven't responded to the statement that this fix makes pure-Rust programs linking glibc on Linux not have UB on parallel std::process::exit.
You're correct that we can't fix the ecosystem problem without fixing the C standard and/or POSIX.