async icon indicating copy to clipboard operation
async copied to clipboard

Should thread IDs be stored weakly?

Open treeowl opened this issue 1 year ago • 3 comments

GHC.Conc.Sync currently says

-- ToDo: data ThreadId = ThreadId (Weak ThreadId#)
-- But since ThreadId# is unlifted, the Weak type must use open
-- type variables.

That seems unlikely to happen, especially now that we have threadStatus (which needs to be able to determine how a thread died).

However, the condition that comment waits for has arrived—we can now have values of type Weak# ThreadId#, which are just slightly more efficient than the Weak# ThreadId values we could have before. No, I don't know why that was considered important. Anyway, the question arises as to whether Asyncs should hold their ThreadId#s weakly. That would mean doing something like

data WeakThreadId = WeakThreadId
  { tid_number :: !Word64
  , tid_weak :: Weak# ThreadId#
  }

The thread ID number is needed to support compareAsyncs and the Ord (Async a) instance.

Making this change would necessitate an API change: the asyncThreadId function (defined as a field accessor) would have to be removed, in favor of something like

asyncThreadIdMaybe :: Async a -> IO (Maybe ThreadId)

One big question in my mind: garbage collecting Weak#s can be pretty expensive. Is releasing threads earlier worth that cost?

treeowl avatar Mar 02 '23 22:03 treeowl

My guess is it probably isn't worth it. A completed ThreadId doesn't take much space.

simonmar avatar Mar 03 '23 08:03 simonmar

@teofilC points out that in certain cases failing to release threads prevents deadlock detection. I have no idea how that works, so can't comment on the significance. See this comment. BTW, the whole situation around "indefinitely blocked on MVar/STM" seems very complicated and subtle, and I don't even know where to find good documentation. Can you point me in the right direction?

treeowl avatar Mar 03 '23 08:03 treeowl

There's a bit of documentation here https://hackage.haskell.org/package/base-4.18.0.0/docs/Control-Concurrent.html#g:14

In particular it's a debugging feature so it doesn't have a well-defined semantics. To have a semantics would mean specifying the garbage collector, the code generator, the simplifier... basically everything, because it depends on exactly which closures contain which references and how the GC traces things.

The emergent behaviour is subtle and surprising, indeed I've been surprised by it myself on many occasions. That's obviously not great, but I also don't have any good ideas for how to make it better, and it is still a useful feature.

simonmar avatar Sep 03 '23 18:09 simonmar