etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Always keep the compact revision, and delay compact tombstone revision

Open ahrtr opened this issue 1 year ago • 7 comments

Always keep the compact revision, no matter it's a tomestone revision or not. If it's a tombstone revision, then cleanup it in next compact operation.

PoC of option 2 in https://docs.google.com/document/d/1APJE38DxENsRLLVapRz1CQ6UD4-uYaFutO12Y01VcNw/edit

cc @fuweid

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

ahrtr avatar Jun 12 '24 12:06 ahrtr

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 68.81%. Comparing base (a6bb22d) to head (763fee3). Report is 2 commits behind head on main.

:exclamation: Current head 763fee3 differs from pull request most recent head 891e412

Please upload reports for the commit 891e412 to get more accurate results.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
Files Coverage Δ
server/storage/mvcc/key_index.go 71.25% <100.00%> (+7.09%) :arrow_up:

... and 25 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18162      +/-   ##
==========================================
- Coverage   68.94%   68.81%   -0.14%     
==========================================
  Files         416      416              
  Lines       35127    35121       -6     
==========================================
- Hits        24219    24169      -50     
- Misses       9518     9556      +38     
- Partials     1390     1396       +6     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a6bb22d...891e412. Read the comment docs.

codecov-commenter avatar Jun 12 '24 12:06 codecov-commenter

@fuweid Please feel free to continue to work on the issue https://github.com/etcd-io/etcd/issues/18089 on top of this PR if it makes sense to you. Pls feel free to ping me on slack if you want discussion.

ahrtr avatar Jun 12 '24 13:06 ahrtr

These are all the test cases I can think of, please feel free to add more. Most likely some of them have already been covered by existing test cases.

I also added the test cases in the doc (see option 2)

cc @fuweid @serathius

Test cases

unit test on keyIndex

case 1: compact a tombstone revision X, it shouldn't be compacted. Next compaction should cleanup the revision X for both cases below,

  • 1.1 The revision X is the last revision for the key
  • 1.2 The revision X isn't the last revision for the key

case 2: compact a normal revision X, it shouldn't be compacted. In next compaction,

  • 2.1 If the revision X is the last revision for the key, it shouldn't be compacted.
  • 2.2 If the revision X isn't the last revision for the key, it should be compacted.

e2e test on watch

case 1: compact tomestone revisions shouldn't impact watch rough steps: 1) start a watch client to watch on a key 2) start a goroutine to continuous put or delete a key 3) perform compaction against the different tombstone revisions multiple times, verify that no revisions are dropped.

case 2: compact normal revisions shouldn't impact watch similar steps as above, but we only execute put operations

case 3: a new watch starting on a compacted tombstone revision should receive the delete event Refer to the reproduction steps in the issue

e2e test on hash values

case 1: All members should have the same hash value up to the compacted revision after compacting a tombstone revision rough steps: 1) start a 3 member cluster; 2) execute some put and delete operations; 3) execute compaction on revision X; 4) get hashKV up to revision X from all members, verify the hash values are consistent

  • 1.1 all members have the same etcd version
  • 1.2 mix versions, i.e. main vs 3.5.14 or release-3.5 vs 3.5.14

case 2: all member should have the same hash value up to the compacted revision after compacting a non-tombstone revision similar steps as above, but execute compact on a non-tombstone revision

  • 2.1 all members have the same etcd version
  • 2.2 mix versions, i.e. main vs 3.5.14 or release-3.5 vs 3.5.14

case 3: All members should have the same hash value up to a non-compacted revision after compacting a tombstone revision similar steps as case 1, but get hashKv up to a non-compacted revision

  • 3.1 all members have the same etcd version
  • 3.2 mix versions, i.e. main vs 3.5.14 or release-3.5 vs 3.5.14

case 4: All members should have the same hash value up to a non-compacted revision after compacting a non-tombstone revision similar steps as case 2, but get hashKv up to a non-compacted revision

  • 4.1 all members have the same etcd version
  • 4.2 mix versions, i.e. main vs 3.5.14 or release-3.5 vs 3.5.14

robustness test

Support compaction, and verify the impact on hash values and watch.

ahrtr avatar Jun 14 '24 15:06 ahrtr

@fuweid are you still working on https://github.com/etcd-io/etcd/issues/18089 based on this PR?

ahrtr avatar Jun 21 '24 08:06 ahrtr

Can we start implementing those tests?

The bigger problem then the bug existing is fact that we don't have those tests already. Even before we have a fix, we can just add the test to codify the current behavior. It should give us better visibility into impact of when we implement the fix.

/cc @MadhavJivrajani @siyuanfoundation @fuweid @jmhbnz @henrybear327 @ah8ad3

serathius avatar Jun 21 '24 09:06 serathius

@serathius: GitHub didn't allow me to request PR reviews from the following users: ah8ad3.

Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @MadhavJivrajani @siyuanfoundation @fuweid @jmhbnz @henrybear327 @ah8ad3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jun 21 '24 09:06 k8s-ci-robot

@ahrtr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-unit-test-386 891e4124b1853cb5def231d89d89ec4a29ddba96 link false /test pull-etcd-unit-test-386
pull-etcd-unit-test-amd64 891e4124b1853cb5def231d89d89ec4a29ddba96 link true /test pull-etcd-unit-test-amd64
pull-etcd-unit-test-arm64 30eaf124d8eaeeb7c05fc2845df708b8e8609e8b link true /test pull-etcd-unit-test-arm64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

k8s-ci-robot avatar Aug 05 '24 22:08 k8s-ci-robot

Can we start implementing those tests?

The bigger problem then the bug existing is fact that we don't have those tests already. Even before we have a fix, we can just add the test to codify the current behavior. It should give us better visibility into impact of when we implement the fix.

/cc @MadhavJivrajani @siyuanfoundation @fuweid @jmhbnz @henrybear327 @ah8ad3

I would like to start working on those tests but i am a little bit lost of track if anybody started it or not is there any update?

ah8ad3 avatar Aug 06 '24 05:08 ah8ad3