go icon indicating copy to clipboard operation
go copied to clipboard

syscall: swap use of read-write locks for ForkLock

Open junchuan-tzh opened this issue 2 years ago • 16 comments

The ForkLock disables fork while creating new fd and marking it close-on-exec. Fd ops hold read lock, and fork holds write lock.

In kernels newer than 2.6, the two fd ops can be done in one step, and thus the ForkLock is no need. Actually, the ForkLock brings a bottleneck to concurrent forks.

Therefore, we exchange read-write locks: Fd ops hold write lock, and fork holds read lock. It is reasonable since fd ops "modify" process state, and fork just "reads" state. With this change, the newer kernels can remove the concurrent bottlenect, and the older kernels can still ensure the safety of fd close-on-exec.

junchuan-tzh avatar Aug 01 '22 08:08 junchuan-tzh

This is not an API change so it doesn't have to go through the proposal process.

ianlancetaylor avatar Aug 02 '22 01:08 ianlancetaylor

This is a breaking change. Consumers that have to set close-on-exec on fds themselves may already be using syscall.ForkLock, and you cannot force everyone to replace Lock with RLock and vice versa.

rittneje avatar Aug 02 '22 04:08 rittneje

syscall.ForkLock is only used in several cases (Pipe, Socket, Accept, Open, Dup), and swapping Lock and RLock in these cases that already are using syscall.ForkLock is enough. The upper APIs/consumers are not aware of this.

junchuan-tzh avatar Aug 02 '22 09:08 junchuan-tzh

No, syscall.ForkLock is a public property that anyone can use. And this is required when you are working with lower-level APIs (e.g., syscall.Socketpair) and have to set close-on-exec yourself.

Also, your statement that kernels newer that 2.6 can do this atomically (via O_CLOEXEC, etc.) only applies to Linux. For example, macOS does not support that flag in all cases. In other words, this change would cause performance issues for macOS applications that create new fds more often than they fork, which personally I suspect is the majority case.

rittneje avatar Aug 02 '22 10:08 rittneje

Using an either-or lock is ok? See issues/23558

Like this (only the first fork holds the write lock):

close-on-exec:

    ForkLock.RLock()
    set_fd_close_on_exec()
    ForkLock.RUnlock()

fork: refcnt
    Mutex.Lock()
    if refcnt == 0:
        // get write lock for the first fork (if a non-fork op holds wirte lock, it is also ok)
        FockLock.Lock()
    refcnt += 1
    Mutex..Unlock()

    do_forkexec()

    Mutex..Lock()
        refcnt -= 1
        if refcnt == 0:
            // release write lock for the last fork
            ForkLock.UnLock()
    Mutex.Unlock()

junchuan-tzh avatar Aug 05 '22 03:08 junchuan-tzh

I'm assuming you meant to write ForkLock.Unlock() for the second block. I think that will work.

rittneje avatar Aug 05 '22 03:08 rittneje

Yes, a typo error.

There is a potential problem. The latency to get ForkLock.RLock() can be very long if there are too many forks. Is there a better way to do either-or lock?

junchuan-tzh avatar Aug 05 '22 03:08 junchuan-tzh

Unfortunately, I think you'd have to be able to ask the ForkLock whether there are pending readers in order to avoid that, and currently that information is not exposed.

rittneje avatar Aug 05 '22 04:08 rittneje

Perhaps we should close this issue as a dup of #23558, as they are about the same problem.

ianlancetaylor avatar Aug 06 '22 00:08 ianlancetaylor

Let me add that both this and #23558 have the same problem: we unfortunately exported syscall.ForkLock, and it is at least possible that people are using it, and that makes it difficult to change.

ianlancetaylor avatar Aug 06 '22 00:08 ianlancetaylor

OK, I think I see a way: https://go.dev/cl/421441. Anybody see a way to improve that code?

ianlancetaylor avatar Aug 06 '22 01:08 ianlancetaylor

Change https://go.dev/cl/421441 mentions this issue: syscall: avoid serializing forks on ForkLock

gopherbot avatar Aug 06 '22 01:08 gopherbot

@ianlancetaylor This is the approach that @junchuan-tzh mentioned above. As discussed, it has an issue where if forkExec is continually being invoked it will prevent syscall.ForkLock.RLock() from ever unblocking.

rittneje avatar Aug 06 '22 01:08 rittneje

@rittneje My apologies for not reading more carefully.

I've updated https://go.dev/cl/421441 to avoid that problem. But it may be too complicated now. What do you think of that approach? Thanks.

ianlancetaylor avatar Aug 06 '22 19:08 ianlancetaylor

Indeed, it is fairly complex. The choice of 10 as the concurrency limit (sort of) also seems very arbitrary.

Perhaps sync.Cond would be a better choice than adding a channel? Just to reduce the number of players.

var forkingCond = sync.NewCond(new(sync.Mutex))

...

forkingCond.L.Lock()
if forking > 10 {
    forkingCond.Wait()
}
if forking == 0 {
    syscall.ForkLock.Lock()
}
forking++
forkingCond.L.Unlock()

defer func() {
    forkingCond.L.Lock()
    forking--
    if forking == 0 {
        syscall.ForkLock.Unlock()
        forkingCond.Broadcast()
    }
    forkingCond.L.Unlock()
}()

rittneje avatar Aug 06 '22 21:08 rittneje

My personal feeling is that sync.Cond never makes things clearer. See also #21165.

ianlancetaylor avatar Aug 06 '22 22:08 ianlancetaylor

I've updated https://go.dev/cl/421441 to use a different approach, using internal secret knowledge about sync.RWMutex. Does anybody see a problem with this approach? Thanks.

ianlancetaylor avatar May 22 '23 22:05 ianlancetaylor

@ianlancetaylor I am confused by the call to runtime.Gosched(). In order for the preceding call to ForkLock.RLock() to unblock, whatever was holding the write lock must have unlocked it. That means (I assume) that whatever calls ForkLock.Lock() next would have to wait for those readers to finish anyway.

rittneje avatar May 22 '23 23:05 rittneje

@rittneje Thanks, you're right, that's not needed. I was thinking that the RLock callers would be woken up but would still have to acquire the read lock, but that's not how it works. The read lock is already held when they are woken up. I will remove the Gosched call.

ianlancetaylor avatar May 22 '23 23:05 ianlancetaylor

@ianlancetaylor I think there may still be a starvation potential hidden here. There is no guarantee that goroutines will acquire the (logical) fork lock in order. For example:

  1. Goroutine A calls ForkLock.Lock().
  2. Goroutine B calls ForkLock.RLock(), which blocks.
  3. Goroutine C tries to acquire fork lock. It sees there is a pending reader (B), so it waits.
  4. Goroutine A releases the write lock.
  5. Goroutine B acquires then releases the read lock.
  6. Goroutine D calls ForkLock.Lock().
  7. Goroutine E calls ForkLock.RLock(), which blocks.
  8. Goroutine C is finally scheduled, and tries to acquire fork lock again. Once again, it is thwarted by pending reader (E), so it waits.

In short, there is a possibility that a goroutine that is intending to fork keeps losing out to other goroutines indefinitely.

rittneje avatar May 23 '23 00:05 rittneje

@rittneje I don't think that can happen with the current RWMutex implementation. When goroutine C blocks trying to acquire the read lock, it increments readerCount before it blocks. The subsequent call to Lock in D will block until readerCount drops back to zero.

ianlancetaylor avatar May 23 '23 00:05 ianlancetaylor

To put it another way, I think that if starvation is possible with this CL, then it is possible with any use of RWMutex, and in particular it's possible today without this CL.

ianlancetaylor avatar May 23 '23 00:05 ianlancetaylor

When goroutine C blocks trying to acquire the read lock, it increments readerCount before it blocks. The subsequent call to Lock in D will block until readerCount drops back to zero.

But that doesn't matter. Even if D blocks, then when C gets scheduled it will immediately release read lock, which unblocks D. Then when C goes to the top of the for loop, it will observe the ForkLock is already taken so will once again fall to the pending readers check.

I think the correct fix is to remove the for loop. Instead, it should unconditionally acquire logical fork lock after waiting for pending readers once.

rittneje avatar May 23 '23 01:05 rittneje

Oh, sorry, I did misunderstand. I see what you mean. Done.

ianlancetaylor avatar May 23 '23 03:05 ianlancetaylor

Thanks, I think it looks good now. One question I do have is whether it would be worth checking for pending writers in addition to pending readers. Otherwise, if there is code outside syscall doing ForkLock.Lock() (which is pretty unlikely but not impossible), it could get starved now.

rittneje avatar May 23 '23 03:05 rittneje

Thanks, I think it looks good now. One question I do have is whether it would be worth checking for pending writers in addition to pending readers. Otherwise, if there is code outside syscall doing ForkLock.Lock() (which is pretty unlikely but not impossible), it could get starved now.

I think this is as expected, if the user misuses forklock

zhuangqh avatar May 23 '23 04:05 zhuangqh

I'm OK with assuming that no program calls syscall.ForkLock.Lock(). Even if there are such programs, they will continue to work unless they also fork new processes continuously. So I'm willing to wait to see if anybody reports a bug.

ianlancetaylor avatar May 23 '23 16:05 ianlancetaylor

Change https://go.dev/cl/507355 mentions this issue: syscall: serialize locks on ForkLock on platforms where forkExecPipe is not atomic

gopherbot avatar Jun 30 '23 15:06 gopherbot