locks icon indicating copy to clipboard operation
locks copied to clipboard

Deadlock due to outdated lock_info

Open xinhaoyuan opened this issue 7 years ago • 0 comments

I was testing locks using my testcase. I believe that there is a bug in the lock_info handling of locks_server and locks_agent, which may cause deadlock.

My testcase has 3 concurrent clients/agents, namely C1, C2, and C3, and 3 locks, [1], [2], and [3].

  • C1 requests locks in the order of [[1], [2], [3]]
  • C2 requests locks in the order of [[2], [3], [1]]
  • C3 requests locks in the order of [[3], [1], [2]]

Here is how the bug happened (in sketch):

  1. C1, C2, and C3 competed on locks. Due to the deadlock resolving algorithm, C1, C2 eventually acquired all locks and finished.

  2. In the resolution process, C3 got lock_info of [2] (due to locks_agent:send_indirects/1) even C3 hadn't reach the point of requesting it, which means C3 was not in [2]'s queue.

  3. The locks_server remove the local lock_info entry of [2] since the queue is empty now. This effectively resets the vsn of the lock_info.

  4. C3 started requesting [2], but the locks_server would respond with lock_info that had lower vsn than what C3 was told with. Thus C3 got stuck.

I've tried to fix by not removing lock_info entries in locks_server, but my fix seems to fail the test in other ways. Maybe this breaks the algorithm?

xinhaoyuan avatar Aug 28 '18 14:08 xinhaoyuan