UnifyFS icon indicating copy to clipboard operation
UnifyFS copied to clipboard

Potential deadlock caused by concurrent sync calls

Open wangvsa opened this issue 1 year ago • 0 comments

Describe the problem you're observing

I'm observing some TIMEOUT errors when trying to stage-in many files simultaneously. It seems that concurrent unifyfs_sync() may cause deadlock on the server side. After some investigations, I found the server side is blocking at the process_pending_sync call in this case:

client A on server 0 --> write/sync file 1 --> owner is server 1 client B on server 1 --> write/sync file 2 --> owner is server 0

https://github.com/LLNL/UnifyFS/blob/58ece4441716678f5111a6dbff9baadd6188c2b6/server/src/unifyfs_service_manager.c#L1479-L1483

@MichaelBrim Is this the cause? Any idea how to fix this?

wangvsa avatar Mar 19 '24 23:03 wangvsa