qdrant
qdrant copied to clipboard
Extend `SerdeWal` to better track the last truncated-to WAL index (#1815)
WAL operates in segments. When Wal::prefix_truncate
is called (during SerdeWal::ack
), WAL is not truncated precisely up to the until_index
, but up to the nearest segment with last_index
that is less-or-equal than until_index
.
Consider the pseudo-graphic illustration of the WAL that was truncated up to index 35:
| -------- | -------- | ===='++++ | ++++++++ | ++++++++ | ++++++++ |
10 20 30 35 40 50 60 70
-
'
marks the index 35 that has been truncated-to -
---
marks segments 10-30 that has been physically deleted -
+++
marks segments 35-70 that are still valid - and
===
marks part of segment 30-34, that is still physically present on disk, but that is "logically" deleted
This PR extends SerdeWal
type to track the last truncated-to index and transparently hide the "logically deleted" parts of WAL from the user of the SerdeWal
type.
Nice. 🤦♀️
@generall I've looked into the Update operation declined: No point with id <ID> found
warnings.
-
when applying operations to a
Segment
,Segment::handle_version
method is called at some point-
we can see, that if
op_point_offset
(which is point's "internal id") argument isNone
, thenop_num
is compared toSegment::version
- and if
Segment::version > op_num
, then the operation is "skipped" without being applied - the comment in that branch in
Segment::handle_version
says "Not a point operation...", but that's not exactly true, because when we are inserting a new point into a segment, the point does not have an "internal id" yet, and soSegment::handle_version
would be called withop_point_offset == None
-
we can see, that if
-
so, when reapplying upsert operation for an (eventually) deleted point,
Segment::handle_version
would be called withop_point_offset == None
, and because we are reapplying the operation,Segment::version > op_num
, and so the operation would be skipped -
now, if we then reapply the set payload operation, there's a
check_unprocessed_points
call at the end ofset_payload
(and some other functions) -
and
check_unprocessed_points
is a rather simplistic check, that does not in any way handle "operation was skipped, becauseop_num < Segment::version
" case
So, what can we do:
- do absolutely nothing and wait unless someone reports the issue with
Update operation declined...
warning once again - remove
check_unprocessed_points
- not a fan of this solution, because the logic around version checks is not exactly trivial and explicit warning is better than silent error
- refactor
check_unprocessed_points
to handle intentionally "skipped" operations somehow- should not be terribly difficult, just tedious
CI runs seem to fail due to issues with GH, I'll re-run them once https://www.githubstatus.com/ is green again.
which option does this PR implement?
which option does this PR implement?
So far - the first one. I wanted your input on this one.
@generall Implemented better check_unprocessed_points
check.
Rebased onto recent dev
.
@generall Added additional tracking of the most recent un-truncated index to the SerdeWal
type. Tried to add it to the wal
crate first, but it, somehow, felt "worse" there: seemed more of a hack, and I was less sure about correctness.
@ffuugoo does this close #1815 ?
@ffuugoo does this close #1815 ?
@coszio It does. I've implemented an explicit check to skip WAL entries that have been applied, and @generall suggested that the result would be the same if we improve WAL truncation tracking.