rust-stm icon indicating copy to clipboard operation
rust-stm copied to clipboard

Two transactions may deadlock

Open Feliix42 opened this issue 4 years ago • 4 comments

As the issue title already states, I found a case in which I (relatively deterministically) manage to produce a deadlock between two transactions. The problem arises from two threads getting stuck in the wait_for_change function which is -- as far as I understood it -- intended to avoid unnecessary recomputations.

I encountered this problem when I tried to implement a variation of Lee's Algorithm for a benchmark using a data structure like this:

 pub type StmGrid = Vec<Vec<Vec<TVar<Field>>>>; // 3D maze

If necessary, I can provide the code which produces the deadlock.

This made me wonder if this waiting for a change in the underlying data is always a wise thing. Especially in cases like this where many small transactions occur which all only touch a small portion of all TVars, it seems rather counter-intuitive to me and in the end I had to patch the waiting for change out to make the benchmark work.

Feliix42 avatar Jun 19 '20 18:06 Feliix42

Hello Felix,

sorry for the late response.

In general deadlocks may happen, but only in situations where not waiting would create livelocks. So if everything works, when you skip the check, its propably a bug.

I do not want to remove the waiting time, bacause there are other scenarios in which you want to wait for a rare event, like a value on a queue, that is pushed by a slow producer.

But I might add something like a quick_retry() function.

Can you show me the code?

Marthog avatar Aug 14 '20 14:08 Marthog

I found the issue:

Deadlocks happen, when set_changed runs right between these two lines

Marthog avatar Aug 14 '20 14:08 Marthog

Thank you for looking into this!

Interesting, so it seems that the deadlock is due to bad timing on behalf of the scheduler? Is this even resolvable? 🤔

Feliix42 avatar Aug 23 '20 18:08 Feliix42

I was trying to come up with a workaround for this issue when I noticed that all the tests in control_block invoke wait on a thread which is different from the one that gets unparked in set_changed. I could be wrong, but I think the test Perform a wakeup from another thread. only worked because by the time wait was called it already set blocked to false. At least this is what I saw when I added some delay before set_changed.

There is a thread::park_timeout variant that solves this issue as well as the deadlock pointed out above by not letting the thread be parked forever, it gives it a chance to check blocked from time to time.

Although based on the documentation of park I don't fully understand why the deadlock happens when set_changed runs between checking blocked and calling park; the docs say a if unpark is called first, the next call to park doesn't block, just consumes the token.

In any case, I added a timeout here. Do you agree with this approach?

I suppose the alternative is to use a Condvar but there must be a reason you wanted to go with park.

aakoshh avatar Oct 25 '21 23:10 aakoshh