cockroach
cockroach copied to clipboard
concurrency: update wait queues in AddDiscoveredLock
Concurrent readers in the face of a concurrent, retrying writer can result in a situation where AddDiscoveredLock moves the timestamp of a held intent past the read timestamp a waiting reader. The reader should be unblocked in this case but previously wasn't, resulting in a lock table verification assertion failure in the form of:
error: non locking reader ... does not conflict with lock holder
This is my best theory for what is happening in #146749 based on increased logging. It is difficult to be certain given the required ordering of events is hard to observe with locking.
See the comment in the test for a more complete timeline that can lead to the bug.
Fixes #146749
Release note: None
@arulajmani If you have some time for review, this would clear a flaky test.
Sorry I missed this. I'll have a look today after our weekly meeting.
I've opened an experiment for the change to the latch timestamp here: https://github.com/cockroachdb/cockroach/pull/148802
I personally think we should probably do both fixes.
I'm going to merge this since it appears correct and we are post-branch cut so there is lots of bake time. We also have an open PR for changing the timestamp at which we latch.
bors r=miraradeva