oneTBB
oneTBB copied to clipboard
Retry if pthread_create fails with EAGAIN
Description
On many Unix-like systems, pthread_create can fail spuriously even if the running machine has enough resources to spawn a new thread. Therefore, if EAGAIN is returned from pthread_create, we actually have to try again.
I observed this issue when running the mold linker (https://github.com/rui314/mold) under a heavy load. mold uses OneTBB for parallelization.
As another data point, Go has the same logic to retry on EAGAIN: https://go-review.googlesource.com/c/go/+/33894/
nanosleep is defined in POSIX 2001, so I believe that all Unix-like systems support it.
- [x] - git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details)
Type of change
- [x] bug fix - change that fixes an issue
- [ ] new feature - change that adds functionality
- [ ] tests - change in tests
- [ ] infrastructure - change in infrastructure and CI
- [ ] documentation - documentation update
Tests
- [ ] added - required for new features and some bug fixes
- [x] not needed
Documentation
- [ ] updated in # - add PR number
- [ ] needs to be updated
- [x] not needed
Breaks backward compatibility
- [ ] Yes
- [x] No
- [ ] Unknown
Notify the following users
Other information
Ping?
May I please ping this?
Would it be possible to land this ? thanks
~~This seems to cause one of the tests to fail:~~
62/136 Testing: test_eh_thread
62/136 Test: test_eh_thread
Command: "/var/tmp/portage/dev-cpp/tbb-2021.7.0_rc1/work/oneTBB-2021.7.0-rc1_build-abi_x86_64.amd64/gnu_11.3_cxx11_64_relwithdebinfo/test_eh_t
hread" "--force-colors=1"
Directory: /var/tmp/portage/dev-cpp/tbb-2021.7.0_rc1/work/oneTBB-2021.7.0-rc1_build-abi_x86_64.amd64/gnu_11.3_cxx11_64_relwithdebinfo
"test_eh_thread" start time: Aug 19 16:44 BST
Output:
----------------------------------------------------------
[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
===============================================================================
/var/tmp/portage/dev-cpp/tbb-2021.7.0_rc1/work/oneTBB-2021.7.0-rc1/test/tbb/test_eh_thread.cpp:90:
TEST CASE: Too many threads
/var/tmp/portage/dev-cpp/tbb-2021.7.0_rc1/work/oneTBB-2021.7.0-rc1/test/tbb/test_eh_thread.cpp:90: ERROR: test case THREW exception: Resource temporarily unavailable
===============================================================================
[doctest] test cases: 1 | 0 passed | 1 failed | 0 skipped
[doctest] assertions: 2 | 2 passed | 0 failed |
[doctest] Status: FAILURE!
<end of output>
Test time = 0.00 sec
----------------------------------------------------------
Test Failed.
"test_eh_thread" end time: Aug 19 16:44 BST
"test_eh_thread" time elapsed: 00:00:00
----------------------------------------------------------
~~It seems like this test is supposed to check for the case where there are too many threads and pthread_create returns EAGAIN, but I'm not sure what to look at following that to debug. Maybe thread_monitor::launch is now throwing the wrong kind of exception so the test isn't catching it?~~
Sorry this test might just be flakey/failing randomly on my system... Please ignore, will do more testing :)
Can someone in the TBB team review and merge this patch? More and more Linux distros are cherrypicking this as an unofficial patch.
@rui314 According to https://man7.org/linux/man-pages/man3/pthread_create.3.html
EAGAIN A system-imposed limit on the number of threads was encountered. There are a number of limits that may trigger this error: the RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which limits the number of processes and threads for a real user ID, was reached; the kernel's system-wide limit on the number of processes and threads, /proc/sys/kernel/threads-max, was reached (see proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max, was reached (see proc(5)).
Or here https://pubs.opengroup.org/onlinepubs/9699919799/
[EAGAIN] The system lacked the necessary resources to create another thread, or the system-imposed limit on the total number of threads in a process {PTHREAD_THREADS_MAX} would be exceeded.
So this means that the system either lacks the necessary resources, or has been set some system limit.
Is there any thread, article or bug report where mentioned that pthread_create can fail even with having enough resources or respecting a set limit?
I've seen pthread_create fail with EAGAIN on busy systems, without reaching any user or kernel limit I was aware of. I suspect @rui314 has a collection of bug reports like this.
Also if this is some particular problem/bug I would expect to wrap the solution into macros. In my point of view this particular workaround looks strange because OS reported that there are not enough resources but we spins and wait for something. (It is counter intuitive)
There seems to be some discussions on Go discussion board, which you can visit from https://go-review.googlesource.com/c/go/+/33894/. Note that Go does the same thing as this is. I also got many bug reports caused by the spurious failure of pthread_create.
I can wrap this with a function (not a macro because it's bad for readability), but since this function itself is a wrapper function to call pthread_create, wrapping it with one more function doesn't seems like a good idea.
The macro could showed that this is known problem, because as I said this behavior is counter intuitive (and pthread_create API). Could you also please add a test case that will reproduce this problem?
We need to count the number of iterations while retrying, so macro wouldn't be a good choice. Let me factor it out as a function.
I factored out the code to a new function and wrote a test.
@rui314 I tried reproduce this failure with test you proposed and it didn't fail. Are there special conditions that should be met to reproduce it? Also, I think if proposed solution will be used on system with real limit we might face with situation when thread will wait on this loop instead useful work.
@rui314 I tried reproduce this failure with test you proposed and it didn't fail. Are there special conditions that should be met to reproduce it?
The test cannot reproduce the issue unless your system is under a heavy load, and even under a heavy load, the probably that the problem occur is very low. I don't think you can see a failure easily.
On the other hand, that also means all applications that currently use OneTBB is subtly unstable and experiencing occasional mysterious crash under a heavy load, which is I believe a critical issue, though.
Also, I think if proposed solution will be used on system with real limit we might face with situation when thread will wait on this loop instead useful work.
Correct. It can cause a delay up to 210 milliseconds for a real failure that returns EAGAIN. That's unfortunately unavoidable as a spurious failure is indistinguishable from non-spurious ones. Hopefully, if a system is so low on resource that it returns a real EAGAIN error, it's sluggish and the 210 milliseconds delay doesn't matter much.
The test cannot reproduce the issue unless your system is under a heavy load, and even under a heavy load, the probably that the problem occur is very low. I don't think you can see a failure easily.
I can quite stably reproduce that when using mold linker in GCC test suite on a pretty modern AMD Zen system. When running 16 parallel tests each spawning 16 threads in mold.
Accordance with Dmitry Vyukov tweet it is general problem with pthread_create. Is there issue that was created on glibc maintainers? (As I understand this issue persist for more than 6 years already) I also found out that some other projects use retry loop on pthread_create. But the amount of retry iterations depends on project. Were there any checked performed to pick retry constant = 20 or it is copy from Go? We will discuss the way how oneTBB should handle this particular problem. It could be graceful degradation or some retry loop.
I don't know if there's a bug filed to glibc for this particular issue. glibc's pthread_create(2) is after all just a wrapper to the system call, so it's not their problem, no?
Speaking of the retry count, 20 is copied from Go. I don't know if it is the best max retry count, but it should at least be battle-tested in the wild.
glibc's pthread_create(2) is after all just a wrapper to the system call, so it's not their problem, no?
Uh, what? pthread_create does a whole lot of stuff, for example apply the pthread_attr_t, allocate a stack, allocate thread local storage, and various stuff I don't know what it means. https://github.com/bminor/glibc/blob/master/nptl/pthread_create.c#L619
Even glibc clone() isn't a straight syscall wrapper; glibc clone() calls a function in the child, but kernel clone() returns twice. https://man7.org/linux/man-pages/man2/clone.2.html#NOTES https://github.com/bminor/glibc/blob/master/sysdeps/unix/sysv/linux/x86_64/clone.S
Sorry for noise, I wrote a bit compact code for same, which does not call nanosleep() with zeros on first error.
static int
pthread_create_eagain(pthread_t *handle, const pthread_attr_t *attr,
void *(*fn)(void*), void *arg) {
int error;
const size_t max_tries = 20;
struct timespec rqts = {0, 0};
for (size_t i = 1; i <= max_tries; i ++) {
error = pthread_create(handle, attr, fn, arg);
if (0 == error || /* Ok, done. */
EAGAIN != error) /* Other error. */
return (error);
/* Retry after tries * 1 millisecond. */
rqts.tv_nsec = (i * 1000 * 1000);
nanosleep(&rqts, NULL);
}
return (EAGAIN);
}
@rui314 Could you please apply @rozhuk-im proposal and remove test (it doesn't reproduce the problem)?
@pavelkumbrasev Do you want me to remove #ifdef __linux__? There's no guarantee that other systems will never fail with the same spurious error.
Simplified the code and removed the test.
Actually the original code was better. I didn't call nanosleep with 0 milliseconds delay, and the new code introduced an unnecessary 20 milliseconds sleep if all pthread_create failed. I'll roll it back.
Could you please check why test is failing?

I don't know why, but that test fails even without my change.
I didn't call nanosleep with 0 milliseconds delay,
Yes, I was wrong with that.
and the new code introduced an unnecessary 20 milliseconds sleep if all pthread_create failed.
It do like you original code do.
// Retry after tries * 1 millisecond.
struct timespec ts = {0, tries * 1000 * 1000};
It do like you original code do.
It actually doesn't. i was incremented before the control reaches here.
Please take another look. Now I inlined the function because it's now applied to all systems that uses pthread_create and the function look too small to be outlined (and I didn't like its complicated parameter signatures).
The problem with this approach that tests or application that uses oneTBB might terminate because it is just 20 attempts still no guarantee for success. (as with the test that I mentioned above) The idea is to combine this approach with graceful degradation.
In a situation in which pthread_create fails 20 times in a row, other resources are very likely to be also very low, and the system isn't probably stable anyway; the OOM killer might kick in and kill your application however robust it is, for example. There are many other failure scenarios. So can we merge this patch now and discuss further improvements after that? Practically, this patch alone seems to eliminate all crashes we've observed in the wild.