Xline
Xline copied to clipboard
[Bug]: Read State
There are several read-state related bugs found in the jepsen tests.
- [x] 1. Lost cmd ids
Sometimes the cmd ids are lost during the test, this is caused by the current read state only collect
sp
, which is not sufficient, because the sp only store the current speculative execute cmd, but later conflict command will goes into ucp. The solution is to add ucp cmds to read state response. - [x] 2. Index barrier not triggers in order
Currently the
IndexBarrier
useslast_trigger_index
to track the last execute index. However the after sync in execute concurrently so a command with a larger index may execute before a command with a smaller index. This requires a refactor.
/// Trigger all barriers whose index is less than or equal to the given index.
pub(crate) fn trigger(&self, index: u64) {
let mut inner_l = self.inner.lock();
if inner_l.last_trigger_index < index {
inner_l.last_trigger_index = index;
}
let mut split_barriers = inner_l.barriers.split_off(&(index.overflow_add(1)));
std::mem::swap(&mut inner_l.barriers, &mut split_barriers);
for (_, barrier) in split_barriers {
barrier.notify(usize::MAX);
}
}
- [x] 3. Commit index on leader may newer than uncommited cmds The commit index may newer because the after sync in execute concurrently. And a cmd with higher log index may complete before a cmd with lower index. To resolve this problem we should combine wait commit index and wait cmd ids, after both returns could we proceed serializable read on current node.
The fix for the first and third is trivial. I'll provide some details about the refactor for the second issue.
As the after sync stage is execute concurrently, we need to first wait for the lower index to complete, before we trigger the higher index.
I uses a binary heap(min heap), aka priority_queue, to track the index triggered. And we use next
to store the lowest index that haven't triggered.
struct IndexBarrierInner {
next: u64,
indices: BinaryHeap<IndexRevision>,
}
Each time the after sync triggers a index, we push it into the heap and do the following procedure repeatedly: check the heap top if it's equal to next
, if they are the same, we pop remove the top element and update `next, other wise we break the loop.
pub(crate) fn trigger(&self, index: u64, revision: i64) {
let mut inner_l = self.inner.lock();
inner_l.indices.push(IndexRevision::new(index, revision));
while inner_l
.indices
.peek()
.map_or(false, |i| i.index.eq(&inner_l.next))
{
let next = inner_l.next;
let IndexRevision { index, revision } = inner_l
.indices
.pop()
.unwrap_or_else(|| unreachable!("IndexRevision should be Some"));
inner_l.next = next.overflow_add(1);
}
}
Issue 2 and 3 are no longer valid after recent refactor, I will only fix the first one.
Refer to: #782
Closed in #857