etcd Periodic corruption check doesn’t do anything after compaction

What happened?

After turning on --experimental-corrupt-check-time, I’ve noticed that any time there has been a compaction and no recent write to etcd, the corruption check fails to actually run.

The output on the leader is

{"level":"info","ts":"2022-08-02T17:55:38.557Z","caller":"etcdserver/corrupt.go:244","msg":"finished peer corruption check","number-of-peers-checked":0}

Looking at a follower, we see the requested revision has been compacted

{"level":"warn","ts":"2022-08-02T17:48:38.516Z","caller":"etcdserver/corrupt.go:377","msg":"failed to get hashKV","requested-revision":2592,"error":"mvcc: required revision has been compacted"}

I then wrote a key to etcd and observed the corruption check working correctly on the next run.

What did you expect to happen?

The corruption check should work regardless of whether or not a compaction has happened very recently.

How can we reproduce it (as minimally and precisely as possible)?

Set --experimental-corrupt-check-time to something very short like 1m
Compact your etcd cluster
Note that the next corruption check doesn't work
Insert a key into etcd
Note that the next corruption check does work

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.5.3
Git SHA: GitNotFound
Go Version: go1.16.2
Go OS/Arch: linux/amd64

$ etcdctl version
etcdctl version: 3.5.3
API version: 3.5

Etcd configuration (command line flags or environment variables)

... --enable-v2=false --auto-compaction-retention=1h --experimental-corrupt-check-time=6h --experimental-initial-corrupt-check --log-level 'info' ...

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

Leaving these off. Happy to share more details if required. It is a healthy five node cluster, no learners, all running v3.5.3.

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

Aug 09 '22 14:08 justinhellreich-stripe

Are you looking into it @justinhellreich-stripe ? Otherwise I will try and investigate the issue.

Aug 10 '22 06:08 jbml

Hey @jbml -- i've done some digging and found that the leader is essentially just getting the latest revision, and then getting the hash key at that revision for each peer https://git.corp.stripe.com/stripe-internal/gocode/blob/fb2be906edbbfcf0f732b0d69431059d129a5ea1/vendor/go.etcd.io/etcd/server/v3/etcdserver/corrupt.go#L154

It's a surprise to me that after compaction, the latest revision is not available to be queried at. Does that seem like the expected behavior? Based on testing with just etcdctl that doesn't seem to be the case.

I might be a bit out of my depth, so if you are able to look into it or suggest a possible fix, that would be great!

Aug 10 '22 14:08 justinhellreich-stripe

Thanks, I will take a look @justinhellreich-stripe

Aug 12 '22 07:08 jbml

When the leader requests the hash for a particular revision from its peers (as part of the corruption check), it will verify if the revision in question is not already compacted:

https://github.com/etcd-io/etcd/blob/e61d43179291562c44225bf49a6d97c52a65fdca/server/storage/mvcc/kvstore.go#L180-L182

I think the problem is that the check to verify if the revision is already compacted (the first "if" above), will return an error when the revision is the same as the compacted one ("<=" comparator).

I think it should instead be a "<" comparator, which would be consistent with what a normal range request (ie. etcdctl get) would test, to verify if a key being requested has already been compacted, as below:

https://github.com/etcd-io/etcd/blob/fff5d00ccfddaf5f06fa11cad65c1ecaa8c09f87/server/storage/mvcc/kvstore_txn.go#L74-L76

I will do some tests to verify this.

Sep 01 '22 13:09 jbml

etcd etcd copied to clipboard

Periodic corruption check doesn’t do anything after compaction

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

etcd
etcd copied to clipboard