golang-set icon indicating copy to clipboard operation
golang-set copied to clipboard

Deadlock risk in threadsafe set

Open rickb777 opened this issue 6 months ago • 7 comments

Many of the methods in the threadsafe set use the obvious pattern

s.Lock()
o.Lock()

...  do something

s.Unlock()
o.Unlock()

This can deadlock if two goroutines are concurrently doing similar operations but with the operands reversed.

The standard solution derives from database locks and is to release locks in the opposite order. I think this is called the Resource Allocation Protocol but my recollection is hazy.

If you were to use defer always, then you would get this for free, i.e.

s.Lock()
defer s.Unlock()
o.Lock()
defer o.Unlock()

...  do something

This should be free from this kind of deadlock because the defer stack calls its functions in reverse order.

However, to reduce the chance this bug might creep back in later, it might be preferable to make the issue explicit like this.

s.Lock()
o.Lock()

...  do something

o.Unlock() // unlock o first to avoid deadlock
s.Unlock()

rickb777 avatar May 17 '25 16:05 rickb777

Hello,

Thanks for the issue.

I haven’t audited the locking code in a long time. I’m aware of the defer keyword but there was a decision to avoid defer due to the small but measurable performance impact it had on this code.

Since most functions are reasonably small in complexity the decision was made with input from other contributors to just keep the mutex locking/unlocking manual.

Maybe the performance impact of defer is so small now it’s worth changing that.

But if I recall, as long as all of our lock/unlocks use a global order and are symmetric we shouldn’t have deadlocks. In fact, none have been reported to me in the years of this package existing.

That said, an audit of the code is still worthwhile since it hasn’t been done in a while.

deckarep avatar May 17 '25 17:05 deckarep

Here's a worked example of a deadlock: https://go.dev/play/p/wju1v6jPbGN

Most lock/unlocks in the code look OK to me, but Each and (transitively) Elements aren't. It would be simple and safe to just use defers everywhere, but I can't speak to the performance aspect.

Panics are not the only demonstrable source of trouble: runtime.Goexit would have the same effect.

ToSlice also locks later than it should, but in a way that shouldn't crash anything; it can just harmlessly preallocate a wrong capacity.

Edit: There is also the caveat that loop bodies used with Each and Elements shouldn't use any set methods unless the program can guarantee that nothing else will be concurrently mutating the set(s). (Otherwise, the existence of a blocked writer is enough to deadlock the reentrant readers.) This can't be fixed with defers, but a prominent warning might be helpful.

hemflit avatar Jun 04 '25 18:06 hemflit

Excellent catch! This is definitely an issue. I'll follow-up with a PR unless someone gets to it before me!

Here's my plan of action:

  • Immediately fix the Each function adding the early return Unlock
  • Follow-up in a separate PR, to re-evaluate using defer everywhere which would avoid bugs like this. I want to stress that using defer is definitely ideal but the cost in tight multi-threaded code was more expensive then one would think even after the Go team landed a significant defer optimization.

deckarep avatar Jun 04 '25 19:06 deckarep

On reconsideration, I think there's another class of deadlocks that can happen, when the read-methods that access two sets are combined with modify-methods:

  • Goroutine 1 is calling a.Equal(b) and has just read-locked a
  • Goroutine 2 is calling b.Equal(a) and has just read-locked b
  • Goroutine 3 is calling a.Add(...) and now tries to write-lock a
  • Goroutine 4 is calling b.Add(...) and now tries to write-lock b
  • Then goroutines 1 and 2 progress to trying to read-lock b and a respectively.
  • Now 1 is blocked by 4 is blocked by 2 is blocked by 3 is blocked by 1.

I don't think any simple refactor can fix this, but imposing an order on the locks should. (E.g. always lock a before b, regardless of which one received the method call.) I'll try to sketch up a PR for that sometime soon - without any premature assumptions that you'll actually want to make that kind of a change, of course, rather just to explore and document what it would look like.

hemflit avatar Jun 05 '25 10:06 hemflit

You know what, after taking a closer look at Each, this is not an early return. This is just a break statement so all it means is the for loop possibly can bail early and the last statement which is t.RUnlock() must run. I fail to see how there is a problem with Each.

deckarep avatar Jun 05 '25 22:06 deckarep

Each wants to unconditionally ensure that RUnlock will get called, but fails to do so if its call to cb happens to panic. Then any other code recovering from that panic will unknowingly leave the set in a broken and dangerous state.

defer t.RUnlock() immediately after locking would fix that completely.

hemflit avatar Jun 06 '25 14:06 hemflit

https://github.com/deckarep/golang-set/pull/164

@hemflit - it seems fitting that you can take a gander at my PR. Mainly the test case which I have simplified to have a little less boilerplate.

deckarep avatar Jun 06 '25 19:06 deckarep