go syscall: reduce contention on ForkLock

trafficstars

There is a story about optimization at https://about.gitlab.com/2018/01/23/how-a-fix-in-go-19-sped-up-our-gitaly-service-by-30x/ . It's not the main point of the story, but they mention that they saw a lot of contention on syscall.ForkLock. On a modern GNU/Linux system, the only thing that ever locks ForkLock is syscall.StartProcess (and syscall.ForkExec). So the only effect of ForkLock is to serialize forks. While that is normally unimportant, evidently it sometimes isn't. We should use a different mechanism.

At present ForkLock is a sync.RWMutex, and forking acquires a write lock. But for this purpose we don't need a read-write lock. We need an either-or lock. We can permit an arbitrary number of goroutines creating descriptors, and we can permit an arbitrary number of goroutines calling fork. We just can't permit a mix of those cases.

Jan 25 '18 22:01 ianlancetaylor

@ianlancetaylor isn't syscall.ForkLock part of Go1 api? It's a public member of the syscall package and part of go1.txt. How do you propose that a CL change mutex type without breaking compatibility?

Jan 26 '18 09:01 rasky

@rasky, we could keep it there and use something else instead, which might even be unexported. Then we could document that ForkLock is vestigial and should not be used.

Is anybody actually using ForkLock anywhere? We'd want to search GitHub and Google's internal codebase to check.

Jan 26 '18 16:01 bradfitz

There's a bit of non-runtime code that assumes RWMutex.

https://github.com/search?p=1&q=language%3Ago+%22syscall.ForkLock%22+-filename%3Apipe_bsd.go+-filename%3Apipe_linux.go&type=Code&utf8=%E2%9C%93

Jan 26 '18 17:01 adamdecaf

@adamdecaf, if they're all guarding syscall.Socket or syscall.Pipe, we could just make those do the right thing without requiring ForkLock, and keep ForkLock with its existing type, but document that it's unnecessary.

Jan 26 '18 18:01 bradfitz

@bradfitz I'm not sure I follow what you're suggesting. ForkLock always guards a non-atomic sequence of two syscalls, usually the first creates a file descriptor (just like syscall.Socket or syscall.Pipe) and the second sets O_CLOEXEC. Are you suggesting that we change syscall.Pipe for instance, to do two syscalls itself, protected by the newer, non-exported lock that we would add? This does sound dangerous to me, as syscall is supposed to be just a thin wrapper around calling the kernel.

Looking at GitHub code, it looks like that there's no code relying on the exact type of ForkLock but only on its implicit interface. So an intermediate solution would be to switch it to a new type that implements the either-or lock semantic but with the same interface of a RWMutex, so that existing code should not break.

If this is not acceptable, I have a another, more convoluted solution in mind. Let's say we want to implement an either-lock type that implements to this interface:

// xorLock is an interface for an either-or lock; given two arbitrary set of users 
// called As and Bs, xorLock can be held by either an arbitrary number of 
// As, or an arbitrary number of Bs, but never by both an A and a B at the 
// same time.
type xorLock Interface {
    func ALock() 
    func AUnlock()

    func BLock()
    func BUnlock()
}

The code I see on GitHub always calls ForkLock.RLock but never ForkLock.Lock (in fact, it's all code that creates file descriptors, not implementing forks). Given that ForkLock is a singleton, one could extend it with another non-exported singleton (think forkLock2 of a type implementing xorLock) with the required fields to implement an either-or lock, so that:

forkLock2.ALock() -> ForkLock.RLock() forkLock2.AUnlock() -> ForkLock.RUnlock() forkLock2.BLock() -> new code using both ForkLock and forkLock2 members forkLock2.BUnlock() -> new code using both ForkLock and forkLock2 members

We would then change the runtime to only use forkLock2 for consistency, and leave existing code with calls to ForkLock.RLock.

Jan 27 '18 10:01 rasky

IIUC, the lock is to prevent child process inheriting open file descriptors that have not yet had close-on-exec set on them in the parent process.

Since I am interested in only redirecting stdin, stdout and stderr, I close all other file descriptors in child process even though they may not be opened in parent
This has the overhead of O(n) loop closing all file descriptors based on ulimit (gets worse with increased file limits)
The overhead is incurred in child process and allows parent to handle more fork/exec without lock

Sample code I tried and am in the process of benchmarking.


#if defined(__linux__)
int closeall() {
	struct dirent *dent;
	DIR *dirp = opendir("/proc/self/fd");

	if (!dirp) {
		return -1;
	}

	while (dent = readdir(dirp)) {
		if (dent->d_name[0] == '.') {
			continue;
		}

		int fd = atoi(dent->d_name);
		if (fd > 2) {
			close(fd);
		}
	}

	return 0;
}
#else
int closeall() {
	// Close all descriptors other than those we have redirected via dup2()
	struct rlimit rl;
	if (getrlimit(RLIMIT_NOFILE, &rl)) {
		perror("getrlimit");
		return -1;
	}

	while (rl.rlim_cur > 2) {
		close(rl.rlim_cur--);
	}

	return 0;
}
#endif

static int fork_exec(const char* exe, char *const argv[], int fds[3], char *const envp[]) {
	const int pid = vfork();

	if (pid) {
		return pid;
	}

	// Redirect stdin, stdout and stderr to parent provided file descriptors
	int fd;
	for (fd = 0; fd < 3; ++fd) {
		if (!(fds[fd] < 0) && fds[fd] != fd) {
			dup2(fds[fd], fd);
		}
	}

	do {
		// Close all file descriptors that should not be inherited
		if (closeall()) {
			break;
		}

		// Start execution in requested image
		execve(exe, argv, envp);
		perror("execve");
	} while(0);

	exit(EXIT_FAILURE);
}

May 05 '21 20:05 mechanicker

Change https://go.dev/cl/421441 mentions this issue: syscall: avoid serializing forks on ForkLock

Aug 06 '22 01:08 gopherbot

@rasky how to implement xorLock? It seems that https://go.dev/cl/421441 is not the best solution.

Is there an existing implementation?

Aug 23 '22 04:08 junchuan-tzh

Is there any new progress on this PR? @junchuan-tzh @ianlancetaylor I also encountered the problem of high delay in concurrent exec

It seems that the concurrency limit is hard to determine

May 08 '23 15:05 zhuangqh

I've updated https://go.dev/cl/421441. Does anybody see a problem with that approach?

May 22 '23 22:05 ianlancetaylor

@ianlancetaylor Thanks for your work. This optimization works great for me. I would like to ask if there is a release schedule for this change? 😀

May 24 '23 03:05 zhuangqh

Assuming nobody finds any problems with it, it will be in the 1.21 release, which is scheduled for around August 1.

May 24 '23 03:05 ianlancetaylor

Change https://go.dev/cl/507355 mentions this issue: syscall: serialize locks on ForkLock on platforms where forkExecPipe is not atomic

Jun 30 '23 15:06 gopherbot

go go copied to clipboard

syscall: reduce contention on ForkLock

go
go copied to clipboard