etcd
etcd copied to clipboard
Periodic corruption check doesn’t do anything after compaction
What happened?
After turning on --experimental-corrupt-check-time, I’ve noticed that any time there has been a compaction and no recent write to etcd, the corruption check fails to actually run.
The output on the leader is
{"level":"info","ts":"2022-08-02T17:55:38.557Z","caller":"etcdserver/corrupt.go:244","msg":"finished peer corruption check","number-of-peers-checked":0}
Looking at a follower, we see the requested revision has been compacted
{"level":"warn","ts":"2022-08-02T17:48:38.516Z","caller":"etcdserver/corrupt.go:377","msg":"failed to get hashKV","requested-revision":2592,"error":"mvcc: required revision has been compacted"}
I then wrote a key to etcd and observed the corruption check working correctly on the next run.
What did you expect to happen?
The corruption check should work regardless of whether or not a compaction has happened very recently.
How can we reproduce it (as minimally and precisely as possible)?
- Set
--experimental-corrupt-check-timeto something very short like 1m - Compact your etcd cluster
- Note that the next corruption check doesn't work
- Insert a key into etcd
- Note that the next corruption check does work
Anything else we need to know?
No response
Etcd version (please run commands below)
$ etcd --version
etcd Version: 3.5.3
Git SHA: GitNotFound
Go Version: go1.16.2
Go OS/Arch: linux/amd64
$ etcdctl version
etcdctl version: 3.5.3
API version: 3.5
Etcd configuration (command line flags or environment variables)
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
Leaving these off. Happy to share more details if required. It is a healthy five node cluster, no learners, all running v3.5.3.
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
No response
Are you looking into it @justinhellreich-stripe ? Otherwise I will try and investigate the issue.
Hey @jbml -- i've done some digging and found that the leader is essentially just getting the latest revision, and then getting the hash key at that revision for each peer https://git.corp.stripe.com/stripe-internal/gocode/blob/fb2be906edbbfcf0f732b0d69431059d129a5ea1/vendor/go.etcd.io/etcd/server/v3/etcdserver/corrupt.go#L154
It's a surprise to me that after compaction, the latest revision is not available to be queried at. Does that seem like the expected behavior? Based on testing with just etcdctl that doesn't seem to be the case.
I might be a bit out of my depth, so if you are able to look into it or suggest a possible fix, that would be great!
Thanks, I will take a look @justinhellreich-stripe
When the leader requests the hash for a particular revision from its peers (as part of the corruption check), it will verify if the revision in question is not already compacted:
https://github.com/etcd-io/etcd/blob/e61d43179291562c44225bf49a6d97c52a65fdca/server/storage/mvcc/kvstore.go#L180-L182
I think the problem is that the check to verify if the revision is already compacted (the first "if" above), will return an error when the revision is the same as the compacted one ("<=" comparator).
I think it should instead be a "<" comparator, which would be consistent with what a normal range request (ie. etcdctl get) would test, to verify if a key being requested has already been compacted, as below:
https://github.com/etcd-io/etcd/blob/fff5d00ccfddaf5f06fa11cad65c1ecaa8c09f87/server/storage/mvcc/kvstore_txn.go#L74-L76
I will do some tests to verify this.