cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

concurrency: update wait queues in AddDiscoveredLock

Open stevendanna opened this issue 6 months ago • 3 comments

Concurrent readers in the face of a concurrent, retrying writer can result in a situation where AddDiscoveredLock moves the timestamp of a held intent past the read timestamp a waiting reader. The reader should be unblocked in this case but previously wasn't, resulting in a lock table verification assertion failure in the form of:

error: non locking reader ... does not conflict with lock holder

This is my best theory for what is happening in #146749 based on increased logging. It is difficult to be certain given the required ordering of events is hard to observe with locking.

See the comment in the test for a more complete timeline that can lead to the bug.

Fixes #146749

Release note: None

stevendanna avatar May 21 '25 12:05 stevendanna

This change is Reviewable

cockroach-teamcity avatar May 21 '25 12:05 cockroach-teamcity

@arulajmani If you have some time for review, this would clear a flaky test.

stevendanna avatar May 28 '25 14:05 stevendanna

Sorry I missed this. I'll have a look today after our weekly meeting.

arulajmani avatar May 28 '25 14:05 arulajmani

I've opened an experiment for the change to the latch timestamp here: https://github.com/cockroachdb/cockroach/pull/148802

I personally think we should probably do both fixes.

stevendanna avatar Jun 25 '25 12:06 stevendanna

I'm going to merge this since it appears correct and we are post-branch cut so there is lots of bake time. We also have an open PR for changing the timestamp at which we latch.

bors r=miraradeva

stevendanna avatar Sep 26 '25 12:09 stevendanna