dashmap
dashmap copied to clipboard
Ways to debug deadlocks
Hi!
What is the recommended approach to debug dead locks associated with Dashmap? Is there some tooling for that purpose?
Thanks
Hey, so this can be a bit tricky. Usually I just fire up lmdb and dump the backtraces of every thread when they happen and go through their callstacks. Does that help?
Thanks I will give that a try. For tokio::sync::RwLock the tokio-console crate can be used. Think my issue is not related to this library so I gonna close the issue.
Could something like this be integrated into Dashmap for debugging purposes? https://lib.rs/crates/no_deadlocks
Hey, so this can be a bit tricky. Usually I just fire up lmdb and dump the backtraces of every thread when they happen and go through their callstacks. Does that help?
Do you have any references or doc for that (lmdb)?
I wanted to give https://github.com/BurtonQin/lockbud a try but it does not work on Mac OS (https://github.com/BurtonQin/lockbud). Which approach do you normally take to get the backtraces of each thread?
Sorry, I made a typo. I rely on the lldb debugger to do this. I run my program and then do thread apply all bt and sift through the backtraces.
I ran (lldb) thread backtrace all and probably found the dead lock call that I also figured out through my logs. However I can't find another lock that blocks.
thread #8, name = 'tokio-runtime-worker'
frame #0: 0x000000019209e5e4 libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x00000001920da638 libsystem_pthread.dylib`_pthread_cond_wait + 1232
frame #2: 0x00000001010708cc butterbrot`dashmap::lock::RawRwLock::lock_exclusive_slow::h3eceab46f26c3724 + 624
frame #3: 0x0000000100a4e75c butterbrot`butterbrot_rust::core::load_and_store::_$u7b$$u7b$closure$u7d$$u7d$::h07759cdff2954bfc + 3948
Is it correct that there must be another RawRwLock or dashmap::lock in the backtraces to confirm that there is a Deadlock (if they relate to the same dashmap)? Unfortunately I have not found another one.
Hi @xacrimon .
In https://github.com/xacrimon/dashmap/issues/79 @notgull says
I think "don't hold a lock across a
.await" should be documented.
Might this be the root cause of my issue cause I have something like:
let account = dm.get_mut(&account);
if let Some(mut a) = account {
a.value_mut().save("update").await; // <-- Holding accross await point
}
Reproducer
#[tokio::test]
async fn dashmap_async_test() {
struct CanAsyncSave {};
impl CanAsyncSave {
pub async fn save(&mut self) {
tokio::time::sleep(Duration::from_millis(1)).await;
}
}
let dm: Arc<DashMap<String, CanAsyncSave>> = Default::default();
let dm_clone = dm.clone();
tokio::task::spawn(async move {
for _ in 0..100 {
let mut entry = dm_clone.get_mut("1").unwrap();
let val = entry.value_mut();
val.save().await;
}
});
for _ in 0..100 {
dm.insert("1".into(), CanAsyncSave {});
let mut entry = dm.get_mut("1").unwrap();
let val = entry.value_mut();
val.save().await;
}
}
Hi @xacrimon .
In #79 @notgull says
I think "don't hold a lock across a
.await" should be documented.Might this be the root cause of my issue cause I have something like:
let account = dm.get_mut(&account); if let Some(mut a) = account { a.value_mut().save("update").await; // <-- Holding accross await point }
Sounds pretty similar to if you were using std mutex, parking_lot, etc., where you want to avoid locking across an await. In cases like that i would just stick with a lock that is await aware, avoid awaiting on it. or change functionality around
This very helpful post covers how to make Dashmap not deadlock in async code: https://draft.ryhl.io/blog/shared-mutable-state/ It also explains why the compiler does not warn about these deadlocks.
The gist is never to await in anything while holding a lock. A lock is taken when accessing the Dashmap, and released when the guard is dropped... If I got it right that is.
The recommended approach is to never access Dashmap directly in async code, but through a convenience wrapper.
Ran into this as I was looping over my DashMap's iterator and performing an async operation within it. This deadlocked my app unpredictably.
My solution was to collect the values from the iterator synchronously first, then loop over the collected values to perform my async operation.
I was reading Alice's blog that was mentioned in this thread.
Alice points out that the compiler doesn't complain about a holding guard (or reference) over an await because it's Send.
I wonder if RefMut, RefMulti, etc. Send trait could be behind a feature gate? Or maybe no_send could be a feature? I'm sure there's a good reason these types are Send but if its possible for users to turn that off, it could make debugging easier?