rust-stm
rust-stm copied to clipboard
Two transactions may deadlock
As the issue title already states, I found a case in which I (relatively deterministically) manage to produce a deadlock between two transactions. The problem arises from two threads getting stuck in the wait_for_change
function which is -- as far as I understood it -- intended to avoid unnecessary recomputations.
I encountered this problem when I tried to implement a variation of Lee's Algorithm for a benchmark using a data structure like this:
pub type StmGrid = Vec<Vec<Vec<TVar<Field>>>>; // 3D maze
If necessary, I can provide the code which produces the deadlock.
This made me wonder if this waiting for a change in the underlying data is always a wise thing. Especially in cases like this where many small transactions occur which all only touch a small portion of all TVars, it seems rather counter-intuitive to me and in the end I had to patch the waiting for change out to make the benchmark work.
Hello Felix,
sorry for the late response.
In general deadlocks may happen, but only in situations where not waiting would create livelocks. So if everything works, when you skip the check, its propably a bug.
I do not want to remove the waiting time, bacause there are other scenarios in which you want to wait for a rare event, like a value on a queue, that is pushed by a slow producer.
But I might add something like a quick_retry() function.
Can you show me the code?
Thank you for looking into this!
Interesting, so it seems that the deadlock is due to bad timing on behalf of the scheduler? Is this even resolvable? 🤔
I was trying to come up with a workaround for this issue when I noticed that all the tests in control_block
invoke wait
on a thread which is different from the one that gets unparked in set_changed
. I could be wrong, but I think the test Perform a wakeup from another thread. only worked because by the time wait
was called it already set blocked
to false
. At least this is what I saw when I added some delay before set_changed
.
There is a thread::park_timeout
variant that solves this issue as well as the deadlock pointed out above by not letting the thread be parked forever, it gives it a chance to check blocked
from time to time.
Although based on the documentation of park
I don't fully understand why the deadlock happens when set_changed
runs between checking blocked
and calling park
; the docs say a if unpark
is called first, the next call to park
doesn't block, just consumes the token.
In any case, I added a timeout here. Do you agree with this approach?
I suppose the alternative is to use a Condvar but there must be a reason you wanted to go with park
.