`CacheRef` does not document how it should not be carried over an await point
If a channel is created using a GuildRef from a task other than the gateway task (e.g. an event handler task), a deadlock can ensue. This is because GuildRef wraps Ref. If the GuildRef is still being held when the Channel Create event is received, then serenity will attempt to update the Guild in the cache with new channels. This will call get_mut on the guilds DashMap, which may deadlock if a Ref is already held, thus causing a deadlock if the two tasks happen to be running on the same system thread at that particular time (which depends on tokio internals).
The root cause here is attempting to hold an equivalent of a sync lock across await boundaries. I don't really see a good fix for the root cause, but it may be good to keep an eye on https://github.com/xacrimon/dashmap/issues/150.
This is impossible to trigger in reality right, because CacheRef is !Send and therefore you cannot hold it over .await?
Well I was holding it over await and this issue was occurring.
I'm also getting what appears to be a deadlock somewhere around the threads API, and I believe it's on get_archived_public_threads (judging by the logs of my app).
@Gaming32 If I'm understanding from your code, you avoid the deadlock by always getting the guild channel? Rather than re-using one already gotten?
I can see the problem here. Usually, CacheRef will prevent you holding any reference to the cache over an await point, which avoids any deadlocking... but you are running in an unusual way where you are performing this cache lookup in tokio::main, which is the one place tokio will let you run an async function that holds !Send over an await point (as you cannot spawn a task that holds a !Send value over an await point.
Since this is an unusual configuration, and CacheRef otherwise is such an improvement to the serenity API, I am going to say this is a documentation issue, not a bug in serenity. We need to document that CacheRef should not be held over an await, even though the compiler will stop you in 99% of times.
@Gaming32 If I'm understanding from your code, you avoid the deadlock by always getting the guild channel? Rather than re-using one already gotten?
Correct. This is fairly cheap since it's just a cache lookup.
I can see the problem here. Usually,
CacheRefwill prevent you holding any reference to the cache over an await point, which avoids any deadlocking... but you are running in an unusual way where you are performing this cache lookup intokio::main, which is the one place tokio will let you run an async function that holds !Send over an await point (as you cannot spawn a task that holds a!Sendvalue over an await point.Since this is an unusual configuration, and
CacheRefotherwise is such an improvement to the serenity API, I am going to say this is a documentation issue, not a bug in serenity. We need to document thatCacheRefshould not be held over an await, even though the compiler will stop you in 99% of times.
Sounds reasonable.