qdrant icon indicating copy to clipboard operation
qdrant copied to clipboard

Extend `SerdeWal` to better track the last truncated-to WAL index (#1815)

Open ffuugoo opened this issue 1 year ago • 8 comments

WAL operates in segments. When Wal::prefix_truncate is called (during SerdeWal::ack), WAL is not truncated precisely up to the until_index, but up to the nearest segment with last_index that is less-or-equal than until_index.

Consider the pseudo-graphic illustration of the WAL that was truncated up to index 35:

| -------- | -------- | ===='++++ | ++++++++ | ++++++++ | ++++++++ |
10         20         30    35    40         50         60         70
  • ' marks the index 35 that has been truncated-to
  • --- marks segments 10-30 that has been physically deleted
  • +++ marks segments 35-70 that are still valid
  • and === marks part of segment 30-34, that is still physically present on disk, but that is "logically" deleted

This PR extends SerdeWal type to track the last truncated-to index and transparently hide the "logically deleted" parts of WAL from the user of the SerdeWal type.

ffuugoo avatar May 09 '23 15:05 ffuugoo

Nice. 🤦‍♀️ Screenshot 2023-05-10 at 18 00 43

ffuugoo avatar May 10 '23 16:05 ffuugoo

@generall I've looked into the Update operation declined: No point with id <ID> found warnings.

  • when applying operations to a Segment, Segment::handle_version method is called at some point

    • we can see, that if op_point_offset (which is point's "internal id") argument is None, then op_num is compared to Segment::version
    • and if Segment::version > op_num, then the operation is "skipped" without being applied
    • the comment in that branch in Segment::handle_version says "Not a point operation...", but that's not exactly true, because when we are inserting a new point into a segment, the point does not have an "internal id" yet, and so Segment::handle_version would be called with op_point_offset == None
  • so, when reapplying upsert operation for an (eventually) deleted point, Segment::handle_version would be called with op_point_offset == None, and because we are reapplying the operation, Segment::version > op_num, and so the operation would be skipped

  • now, if we then reapply the set payload operation, there's a check_unprocessed_points call at the end of set_payload (and some other functions)

  • and check_unprocessed_points is a rather simplistic check, that does not in any way handle "operation was skipped, because op_num < Segment::version" case

So, what can we do:

  • do absolutely nothing and wait unless someone reports the issue with Update operation declined... warning once again
  • remove check_unprocessed_points
    • not a fan of this solution, because the logic around version checks is not exactly trivial and explicit warning is better than silent error
  • refactor check_unprocessed_points to handle intentionally "skipped" operations somehow
    • should not be terribly difficult, just tedious

ffuugoo avatar May 10 '23 16:05 ffuugoo

CI runs seem to fail due to issues with GH, I'll re-run them once https://www.githubstatus.com/ is green again.

ffuugoo avatar May 10 '23 16:05 ffuugoo

which option does this PR implement?

generall avatar May 10 '23 20:05 generall

which option does this PR implement?

So far - the first one. I wanted your input on this one.

ffuugoo avatar May 10 '23 21:05 ffuugoo

@generall Implemented better check_unprocessed_points check.

ffuugoo avatar May 12 '23 08:05 ffuugoo

Rebased onto recent dev.

ffuugoo avatar May 25 '23 09:05 ffuugoo

@generall Added additional tracking of the most recent un-truncated index to the SerdeWal type. Tried to add it to the wal crate first, but it, somehow, felt "worse" there: seemed more of a hack, and I was less sure about correctness.

ffuugoo avatar May 30 '23 16:05 ffuugoo

@ffuugoo does this close #1815 ?

coszio avatar Jun 13 '23 19:06 coszio

@ffuugoo does this close #1815 ?

@coszio It does. I've implemented an explicit check to skip WAL entries that have been applied, and @generall suggested that the result would be the same if we improve WAL truncation tracking.

ffuugoo avatar Jun 14 '23 16:06 ffuugoo