parking_lot icon indicating copy to clipboard operation
parking_lot copied to clipboard

Mutex stuck locking forever under weird conditions

Open Jerald opened this issue 6 years ago • 4 comments

I have a very weird situation where compiling a library on the machine my program is running will cause the mutex to get stuck locking. But only when called from specific places. There are a lot of moving parts to my project, so I honestly doubt much can be fixed from it. All I know is that using the mutex from parking_lot causes it to get stuck locking, while using the std::sync mutex works fine.

Below I've included a detailed description of my situation. But as I said, I honestly don't expect anything helpful to come from this due to the complexity of the whole thing.

If there's any other information I can give to help with the issue, let me know.


My project is a discord bot using the Serenity crate. I'm implementing a fancy system to dynamically reload commands from a rust dylib, helped by the libloading crate. To keep things threadsafe, I have a mutex around the inner framework of the bot, which is modified whenever something is reloaded. I noticed I got segfaults and other weird problems all of a sudden and it was quite confusing. After some debugging, I found out that some calls to lock the inner framework never completed, leaving things in a weird broken state.

Specifically, if I'm recompiling the program and I issue a command that locks the inner framework, the locking process will get stuck. Notably, this seems to only happen when the lock request is coming from some of the dynamically loaded code, since the main core of the bot can still lock the framework no problem. Whether it's coincidental that it's the dynamically loaded code that is able to cause it problem is unknown. I've not been able to make a minimally reproducible version of the error outside of my specific situation. Though it is consistently reproducible within what I have.

I've tried using the fair unlocking, thinking there was locking contention, but that had no effect. Using try_lock_until failed after the timeout, proving to me that it's not in fact crashed the thread, just stuck locking forever. As I said, switching to the std::sync mutex is the only thing that seems to fix the problem.

Here's a link to the repo of my project: https://github.com/Jerald/fracking-toaster The code there's not entirely up to date, but the part that causes is the issue is still present.

Jerald avatar Aug 15 '19 17:08 Jerald

I think that I know what the problem is here. Because you are using the dylib crate type, there are in fact two entirely separate instances of the parking lot: one in the main executable, and one in the dylib (in fact there is also a separate copy of toaster-core in the dylib and in the executable). If a thread is parked (i.e sleeping) in one of the parking lots, it won't get woken up if the wakeups are performed in the other parking lot.

The solution to your issue is to compile every crate as a separate dylib. That way both the plugin and the executable can share the same parking_lot crate dylib. This can be done with -C prefer-dynamic, search a bit on Google for how to use it (I've never used it myself so I can't help you there).

Amanieu avatar Aug 15 '19 18:08 Amanieu

Unfortunately I don't think the problem is that simple. The prefer-dynamic linking would help with a few things, and in fact is something I'm planning on looking into soon, but I'm pretty sure it's not going to help here.

If the situation you suggested was the cause, there's no reason it would work ever. But notably, things only stop working in my situation when I'm compiling at the same time. My off-the-cuff guess is something caused by lack of cores/threads available on my machine, but that's only conjecture.

When I get the prefer-dynamic linking up I'll try out the parking_lot mutex again and let you know if it works.

Jerald avatar Aug 15 '19 18:08 Jerald

The parking lot is only involved if there is contention on a mutex: i.e. a thread tried to lock it while another thread is already holding the lock. Otherwise everything goes through an inline fast path that never touches the global parking lot structure (which is the issue here).

The reason why this problem only occurs when you are compiling is because the OS needs to suspend threads to perform task switching since there are more active threads than cores. This makes it more likely that a thread is suspended while holding a lock, which will cause other threads to block when trying to acquire the lock.

Amanieu avatar Aug 16 '19 00:08 Amanieu

I see, that's interesting. I wasn't aware the parking structure (no pun intended) was only used in contentious situations. That would explain the problems then.

I'm currently working through getting fully dynamic linking via prefer-dynamic working, so I'll update with whether or not that fixes the issue soon hopefully.

Jerald avatar Aug 16 '19 21:08 Jerald