go
go copied to clipboard
syscall: reduce contention on ForkLock
There is a story about optimization at https://about.gitlab.com/2018/01/23/how-a-fix-in-go-19-sped-up-our-gitaly-service-by-30x/ . It's not the main point of the story, but they mention that they saw a lot of contention on syscall.ForkLock. On a modern GNU/Linux system, the only thing that ever locks ForkLock is syscall.StartProcess (and syscall.ForkExec). So the only effect of ForkLock is to serialize forks. While that is normally unimportant, evidently it sometimes isn't. We should use a different mechanism.
At present ForkLock is a sync.RWMutex, and forking acquires a write lock. But for this purpose we don't need a read-write lock. We need an either-or lock. We can permit an arbitrary number of goroutines creating descriptors, and we can permit an arbitrary number of goroutines calling fork. We just can't permit a mix of those cases.
@ianlancetaylor isn't syscall.ForkLock part of Go1 api? It's a public member of the syscall package and part of go1.txt. How do you propose that a CL change mutex type without breaking compatibility?
@rasky, we could keep it there and use something else instead, which might even be unexported. Then we could document that ForkLock is vestigial and should not be used.
Is anybody actually using ForkLock anywhere? We'd want to search GitHub and Google's internal codebase to check.
There's a bit of non-runtime code that assumes RWMutex.
https://github.com/search?p=1&q=language%3Ago+%22syscall.ForkLock%22+-filename%3Apipe_bsd.go+-filename%3Apipe_linux.go&type=Code&utf8=%E2%9C%93
@adamdecaf, if they're all guarding syscall.Socket or syscall.Pipe, we could just make those do the right thing without requiring ForkLock, and keep ForkLock with its existing type, but document that it's unnecessary.
@bradfitz I'm not sure I follow what you're suggesting. ForkLock always guards a non-atomic sequence of two syscalls, usually the first creates a file descriptor (just like syscall.Socket or syscall.Pipe) and the second sets O_CLOEXEC. Are you suggesting that we change syscall.Pipe for instance, to do two syscalls itself, protected by the newer, non-exported lock that we would add? This does sound dangerous to me, as syscall is supposed to be just a thin wrapper around calling the kernel.
Looking at GitHub code, it looks like that there's no code relying on the exact type of ForkLock but only on its implicit interface. So an intermediate solution would be to switch it to a new type that implements the either-or lock semantic but with the same interface of a RWMutex, so that existing code should not break.
If this is not acceptable, I have a another, more convoluted solution in mind. Let's say we want to implement an either-lock type that implements to this interface:
// xorLock is an interface for an either-or lock; given two arbitrary set of users
// called As and Bs, xorLock can be held by either an arbitrary number of
// As, or an arbitrary number of Bs, but never by both an A and a B at the
// same time.
type xorLock Interface {
func ALock()
func AUnlock()
func BLock()
func BUnlock()
}
The code I see on GitHub always calls ForkLock.RLock but never ForkLock.Lock (in fact, it's all code that creates file descriptors, not implementing forks). Given that ForkLock is a singleton, one could extend it with another non-exported singleton (think forkLock2 of a type implementing xorLock) with the required fields to implement an either-or lock, so that:
forkLock2.ALock() -> ForkLock.RLock()
forkLock2.AUnlock() -> ForkLock.RUnlock()
forkLock2.BLock() -> new code using both ForkLock and forkLock2 members
forkLock2.BUnlock() -> new code using both ForkLock and forkLock2 members
We would then change the runtime to only use forkLock2 for consistency, and leave existing code with calls to ForkLock.RLock.
IIUC, the lock is to prevent child process inheriting open file descriptors that have not yet had close-on-exec set on them in the parent process.
- Since I am interested in only redirecting stdin, stdout and stderr, I close all other file descriptors in child process even though they may not be opened in parent
- This has the overhead of O(n) loop closing all file descriptors based on ulimit (gets worse with increased file limits)
- The overhead is incurred in child process and allows parent to handle more fork/exec without lock
Sample code I tried and am in the process of benchmarking.
#if defined(__linux__)
int closeall() {
struct dirent *dent;
DIR *dirp = opendir("/proc/self/fd");
if (!dirp) {
return -1;
}
while (dent = readdir(dirp)) {
if (dent->d_name[0] == '.') {
continue;
}
int fd = atoi(dent->d_name);
if (fd > 2) {
close(fd);
}
}
return 0;
}
#else
int closeall() {
// Close all descriptors other than those we have redirected via dup2()
struct rlimit rl;
if (getrlimit(RLIMIT_NOFILE, &rl)) {
perror("getrlimit");
return -1;
}
while (rl.rlim_cur > 2) {
close(rl.rlim_cur--);
}
return 0;
}
#endif
static int fork_exec(const char* exe, char *const argv[], int fds[3], char *const envp[]) {
const int pid = vfork();
if (pid) {
return pid;
}
// Redirect stdin, stdout and stderr to parent provided file descriptors
int fd;
for (fd = 0; fd < 3; ++fd) {
if (!(fds[fd] < 0) && fds[fd] != fd) {
dup2(fds[fd], fd);
}
}
do {
// Close all file descriptors that should not be inherited
if (closeall()) {
break;
}
// Start execution in requested image
execve(exe, argv, envp);
perror("execve");
} while(0);
exit(EXIT_FAILURE);
}
Change https://go.dev/cl/421441 mentions this issue: syscall: avoid serializing forks on ForkLock
@rasky how to implement xorLock? It seems that https://go.dev/cl/421441 is not the best solution.
Is there an existing implementation?
Is there any new progress on this PR? @junchuan-tzh @ianlancetaylor I also encountered the problem of high delay in concurrent exec
It seems that the concurrency limit is hard to determine
I've updated https://go.dev/cl/421441. Does anybody see a problem with that approach?
@ianlancetaylor Thanks for your work. This optimization works great for me. I would like to ask if there is a release schedule for this change? 😀
Assuming nobody finds any problems with it, it will be in the 1.21 release, which is scheduled for around August 1.
Change https://go.dev/cl/507355 mentions this issue: syscall: serialize locks on ForkLock on platforms where forkExecPipe is not atomic