Xline icon indicating copy to clipboard operation
Xline copied to clipboard

[Bug]: Read State

Open bsbds opened this issue 1 year ago • 3 comments

There are several read-state related bugs found in the jepsen tests.

  • [x] 1. Lost cmd ids Sometimes the cmd ids are lost during the test, this is caused by the current read state only collect sp, which is not sufficient, because the sp only store the current speculative execute cmd, but later conflict command will goes into ucp. The solution is to add ucp cmds to read state response.
  • [x] 2. Index barrier not triggers in order Currently the IndexBarrier uses last_trigger_index to track the last execute index. However the after sync in execute concurrently so a command with a larger index may execute before a command with a smaller index. This requires a refactor.
    /// Trigger all barriers whose index is less than or equal to the given index.
    pub(crate) fn trigger(&self, index: u64) {
        let mut inner_l = self.inner.lock();
        if inner_l.last_trigger_index < index {
            inner_l.last_trigger_index = index;
        }
        let mut split_barriers = inner_l.barriers.split_off(&(index.overflow_add(1)));
        std::mem::swap(&mut inner_l.barriers, &mut split_barriers);
        for (_, barrier) in split_barriers {
            barrier.notify(usize::MAX);
        }
    }
  • [x] 3. Commit index on leader may newer than uncommited cmds The commit index may newer because the after sync in execute concurrently. And a cmd with higher log index may complete before a cmd with lower index. To resolve this problem we should combine wait commit index and wait cmd ids, after both returns could we proceed serializable read on current node.

bsbds avatar Nov 02 '23 01:11 bsbds

The fix for the first and third is trivial. I'll provide some details about the refactor for the second issue.

As the after sync stage is execute concurrently, we need to first wait for the lower index to complete, before we trigger the higher index.

I uses a binary heap(min heap), aka priority_queue, to track the index triggered. And we use next to store the lowest index that haven't triggered.

struct IndexBarrierInner {
    next: u64,
    indices: BinaryHeap<IndexRevision>,
}

Each time the after sync triggers a index, we push it into the heap and do the following procedure repeatedly: check the heap top if it's equal to next, if they are the same, we pop remove the top element and update `next, other wise we break the loop.

    pub(crate) fn trigger(&self, index: u64, revision: i64) {
        let mut inner_l = self.inner.lock();
        inner_l.indices.push(IndexRevision::new(index, revision));
        while inner_l
            .indices
            .peek()
            .map_or(false, |i| i.index.eq(&inner_l.next))
        {
            let next = inner_l.next;
            let IndexRevision { index, revision } = inner_l
                .indices
                .pop()
                .unwrap_or_else(|| unreachable!("IndexRevision should be Some"));
            inner_l.next = next.overflow_add(1);
        }
    }

bsbds avatar Nov 02 '23 02:11 bsbds

Issue 2 and 3 are no longer valid after recent refactor, I will only fix the first one.

bsbds avatar Jan 04 '24 01:01 bsbds

Refer to: #782

Phoenix500526 avatar Apr 19 '24 08:04 Phoenix500526

Closed in #857

Phoenix500526 avatar Aug 26 '24 12:08 Phoenix500526