oneTBB icon indicating copy to clipboard operation
oneTBB copied to clipboard

Retry if pthread_create fails with EAGAIN

Open rui314 opened this issue 3 years ago • 4 comments
trafficstars

Description

On many Unix-like systems, pthread_create can fail spuriously even if the running machine has enough resources to spawn a new thread. Therefore, if EAGAIN is returned from pthread_create, we actually have to try again.

I observed this issue when running the mold linker (https://github.com/rui314/mold) under a heavy load. mold uses OneTBB for parallelization.

As another data point, Go has the same logic to retry on EAGAIN: https://go-review.googlesource.com/c/go/+/33894/

nanosleep is defined in POSIX 2001, so I believe that all Unix-like systems support it.

  • [x] - git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details)

Type of change

  • [x] bug fix - change that fixes an issue
  • [ ] new feature - change that adds functionality
  • [ ] tests - change in tests
  • [ ] infrastructure - change in infrastructure and CI
  • [ ] documentation - documentation update

Tests

  • [ ] added - required for new features and some bug fixes
  • [x] not needed

Documentation

  • [ ] updated in # - add PR number
  • [ ] needs to be updated
  • [x] not needed

Breaks backward compatibility

  • [ ] Yes
  • [x] No
  • [ ] Unknown

Notify the following users

Other information

rui314 avatar May 07 '22 12:05 rui314

Ping?

rui314 avatar May 18 '22 12:05 rui314

May I please ping this?

marxin avatar Jul 22 '22 10:07 marxin

Would it be possible to land this ? thanks

sylvestre avatar Aug 05 '22 09:08 sylvestre

~~This seems to cause one of the tests to fail:~~

62/136 Testing: test_eh_thread
62/136 Test: test_eh_thread
Command: "/var/tmp/portage/dev-cpp/tbb-2021.7.0_rc1/work/oneTBB-2021.7.0-rc1_build-abi_x86_64.amd64/gnu_11.3_cxx11_64_relwithdebinfo/test_eh_t
hread" "--force-colors=1"
Directory: /var/tmp/portage/dev-cpp/tbb-2021.7.0_rc1/work/oneTBB-2021.7.0-rc1_build-abi_x86_64.amd64/gnu_11.3_cxx11_64_relwithdebinfo
"test_eh_thread" start time: Aug 19 16:44 BST
Output:
----------------------------------------------------------
[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
===============================================================================
/var/tmp/portage/dev-cpp/tbb-2021.7.0_rc1/work/oneTBB-2021.7.0-rc1/test/tbb/test_eh_thread.cpp:90:
TEST CASE:  Too many threads

/var/tmp/portage/dev-cpp/tbb-2021.7.0_rc1/work/oneTBB-2021.7.0-rc1/test/tbb/test_eh_thread.cpp:90: ERROR: test case THREW exception: Resource temporarily unavailable

===============================================================================
[doctest] test cases: 1 | 0 passed | 1 failed | 0 skipped
[doctest] assertions: 2 | 2 passed | 0 failed |
[doctest] Status: FAILURE!
<end of output>
Test time =   0.00 sec
----------------------------------------------------------
Test Failed.
"test_eh_thread" end time: Aug 19 16:44 BST
"test_eh_thread" time elapsed: 00:00:00
----------------------------------------------------------

~~It seems like this test is supposed to check for the case where there are too many threads and pthread_create returns EAGAIN, but I'm not sure what to look at following that to debug. Maybe thread_monitor::launch is now throwing the wrong kind of exception so the test isn't catching it?~~


Sorry this test might just be flakey/failing randomly on my system... Please ignore, will do more testing :)

MatthewGentoo avatar Aug 19 '22 16:08 MatthewGentoo

Can someone in the TBB team review and merge this patch? More and more Linux distros are cherrypicking this as an unofficial patch.

rui314 avatar Oct 10 '22 06:10 rui314

@rui314 According to https://man7.org/linux/man-pages/man3/pthread_create.3.html

EAGAIN A system-imposed limit on the number of threads was encountered. There are a number of limits that may trigger this error: the RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which limits the number of processes and threads for a real user ID, was reached; the kernel's system-wide limit on the number of processes and threads, /proc/sys/kernel/threads-max, was reached (see proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max, was reached (see proc(5)).

Or here https://pubs.opengroup.org/onlinepubs/9699919799/

[EAGAIN] The system lacked the necessary resources to create another thread, or the system-imposed limit on the total number of threads in a process {PTHREAD_THREADS_MAX} would be exceeded.

So this means that the system either lacks the necessary resources, or has been set some system limit.

Is there any thread, article or bug report where mentioned that pthread_create can fail even with having enough resources or respecting a set limit?

isaevil avatar Oct 10 '22 09:10 isaevil

I've seen pthread_create fail with EAGAIN on busy systems, without reaching any user or kernel limit I was aware of. I suspect @rui314 has a collection of bug reports like this.

lnicola avatar Oct 10 '22 09:10 lnicola

Also if this is some particular problem/bug I would expect to wrap the solution into macros. In my point of view this particular workaround looks strange because OS reported that there are not enough resources but we spins and wait for something. (It is counter intuitive)

pavelkumbrasev avatar Oct 10 '22 09:10 pavelkumbrasev

There seems to be some discussions on Go discussion board, which you can visit from https://go-review.googlesource.com/c/go/+/33894/. Note that Go does the same thing as this is. I also got many bug reports caused by the spurious failure of pthread_create.

I can wrap this with a function (not a macro because it's bad for readability), but since this function itself is a wrapper function to call pthread_create, wrapping it with one more function doesn't seems like a good idea.

rui314 avatar Oct 10 '22 10:10 rui314

The macro could showed that this is known problem, because as I said this behavior is counter intuitive (and pthread_create API). Could you also please add a test case that will reproduce this problem?

pavelkumbrasev avatar Oct 10 '22 10:10 pavelkumbrasev

We need to count the number of iterations while retrying, so macro wouldn't be a good choice. Let me factor it out as a function.

rui314 avatar Oct 10 '22 11:10 rui314

I factored out the code to a new function and wrote a test.

rui314 avatar Oct 12 '22 08:10 rui314

@rui314 I tried reproduce this failure with test you proposed and it didn't fail. Are there special conditions that should be met to reproduce it? Also, I think if proposed solution will be used on system with real limit we might face with situation when thread will wait on this loop instead useful work.

pavelkumbrasev avatar Oct 12 '22 13:10 pavelkumbrasev

@rui314 I tried reproduce this failure with test you proposed and it didn't fail. Are there special conditions that should be met to reproduce it?

The test cannot reproduce the issue unless your system is under a heavy load, and even under a heavy load, the probably that the problem occur is very low. I don't think you can see a failure easily.

On the other hand, that also means all applications that currently use OneTBB is subtly unstable and experiencing occasional mysterious crash under a heavy load, which is I believe a critical issue, though.

Also, I think if proposed solution will be used on system with real limit we might face with situation when thread will wait on this loop instead useful work.

Correct. It can cause a delay up to 210 milliseconds for a real failure that returns EAGAIN. That's unfortunately unavoidable as a spurious failure is indistinguishable from non-spurious ones. Hopefully, if a system is so low on resource that it returns a real EAGAIN error, it's sluggish and the 210 milliseconds delay doesn't matter much.

rui314 avatar Oct 13 '22 07:10 rui314

The test cannot reproduce the issue unless your system is under a heavy load, and even under a heavy load, the probably that the problem occur is very low. I don't think you can see a failure easily.

I can quite stably reproduce that when using mold linker in GCC test suite on a pretty modern AMD Zen system. When running 16 parallel tests each spawning 16 threads in mold.

marxin avatar Oct 13 '22 08:10 marxin

Accordance with Dmitry Vyukov tweet it is general problem with pthread_create. Is there issue that was created on glibc maintainers? (As I understand this issue persist for more than 6 years already) I also found out that some other projects use retry loop on pthread_create. But the amount of retry iterations depends on project. Were there any checked performed to pick retry constant = 20 or it is copy from Go? We will discuss the way how oneTBB should handle this particular problem. It could be graceful degradation or some retry loop.

pavelkumbrasev avatar Oct 13 '22 09:10 pavelkumbrasev

I don't know if there's a bug filed to glibc for this particular issue. glibc's pthread_create(2) is after all just a wrapper to the system call, so it's not their problem, no?

Speaking of the retry count, 20 is copied from Go. I don't know if it is the best max retry count, but it should at least be battle-tested in the wild.

rui314 avatar Oct 13 '22 09:10 rui314

glibc's pthread_create(2) is after all just a wrapper to the system call, so it's not their problem, no?

Uh, what? pthread_create does a whole lot of stuff, for example apply the pthread_attr_t, allocate a stack, allocate thread local storage, and various stuff I don't know what it means. https://github.com/bminor/glibc/blob/master/nptl/pthread_create.c#L619

Even glibc clone() isn't a straight syscall wrapper; glibc clone() calls a function in the child, but kernel clone() returns twice. https://man7.org/linux/man-pages/man2/clone.2.html#NOTES https://github.com/bminor/glibc/blob/master/sysdeps/unix/sysv/linux/x86_64/clone.S

Alcaro avatar Oct 13 '22 10:10 Alcaro

Sorry for noise, I wrote a bit compact code for same, which does not call nanosleep() with zeros on first error.

static int
pthread_create_eagain(pthread_t *handle, const pthread_attr_t *attr,
    void *(*fn)(void*), void *arg) {
	int error;
	const size_t max_tries = 20;
	struct timespec rqts = {0, 0};

	for (size_t i = 1; i <= max_tries; i ++) {
		error = pthread_create(handle, attr, fn, arg);
		if (0 == error || /* Ok, done. */
		    EAGAIN != error) /* Other error. */
			return (error);
		/* Retry after tries * 1 millisecond. */
		rqts.tv_nsec = (i * 1000 * 1000);
		nanosleep(&rqts, NULL);
	}

	return (EAGAIN);
}

rozhuk-im avatar Oct 18 '22 12:10 rozhuk-im

@rui314 Could you please apply @rozhuk-im proposal and remove test (it doesn't reproduce the problem)?

pavelkumbrasev avatar Oct 21 '22 12:10 pavelkumbrasev

@pavelkumbrasev Do you want me to remove #ifdef __linux__? There's no guarantee that other systems will never fail with the same spurious error.

rui314 avatar Oct 21 '22 13:10 rui314

Simplified the code and removed the test.

rui314 avatar Oct 21 '22 13:10 rui314

Actually the original code was better. I didn't call nanosleep with 0 milliseconds delay, and the new code introduced an unnecessary 20 milliseconds sleep if all pthread_create failed. I'll roll it back.

rui314 avatar Oct 21 '22 13:10 rui314

Could you please check why test is failing? image

pavelkumbrasev avatar Oct 21 '22 14:10 pavelkumbrasev

I don't know why, but that test fails even without my change.

rui314 avatar Oct 21 '22 14:10 rui314

I didn't call nanosleep with 0 milliseconds delay,

Yes, I was wrong with that.

and the new code introduced an unnecessary 20 milliseconds sleep if all pthread_create failed.

It do like you original code do.

    // Retry after tries * 1 millisecond.
    struct timespec ts = {0, tries * 1000 * 1000};

rozhuk-im avatar Oct 21 '22 19:10 rozhuk-im

It do like you original code do.

It actually doesn't. i was incremented before the control reaches here.

rui314 avatar Oct 21 '22 23:10 rui314

Please take another look. Now I inlined the function because it's now applied to all systems that uses pthread_create and the function look too small to be outlined (and I didn't like its complicated parameter signatures).

rui314 avatar Oct 22 '22 00:10 rui314

The problem with this approach that tests or application that uses oneTBB might terminate because it is just 20 attempts still no guarantee for success. (as with the test that I mentioned above) The idea is to combine this approach with graceful degradation.

pavelkumbrasev avatar Oct 24 '22 13:10 pavelkumbrasev

In a situation in which pthread_create fails 20 times in a row, other resources are very likely to be also very low, and the system isn't probably stable anyway; the OOM killer might kick in and kill your application however robust it is, for example. There are many other failure scenarios. So can we merge this patch now and discuss further improvements after that? Practically, this patch alone seems to eliminate all crashes we've observed in the wild.

rui314 avatar Oct 24 '22 14:10 rui314