etcd
etcd copied to clipboard
etcdserver: fix corruption check when server has just been compacted
When a key-value store corruption check happens immediately after a compaction, the revision at which the key-value store hash is computed, is the compacted revision itself. In that case, the hash computation logic was incorrect because it returned an ErrCompacted error; this error should instead be returned when the revision at which the key-value store is hashed, is strictly lower than the compacted revision.
Fixes #14325
Signed-off-by: Jeremy Leach [email protected]
My issues with the proposed change:
- It makes
HashKVrevision semantics inconsistent with rest of API. Compaction removes etcd history up to and including specified revision. This operation is inclusive, meaning it also removes the changes from the revision. This is why you cannot request neitherHashKVnorRangeor revision that was compacted. This change would make it possible to get Hash for revision that is no longer accessible rest of API. - Impact is negligible as provided scenario is not realistic. I don't expect that in production clusters someone will compact up to the latest revision and than don't do any updates for long enough for corruption check to run. Periodic corruption check is pretty heavy operation so I also don't expect anyone to set it for 1 minute period. For example proposed period was 12h https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-4.0.md#etcd-server
- Periodic check is known to be not reliable. We have already implemented a replacement. https://github.com/etcd-io/etcd/issues/14039. I think we can consider deprecating the old check when we have enough production data.
Thank you for reviewing the PR @serathius
- It makes
HashKVrevision semantics inconsistent with rest of API. Compaction removes etcd history up to and including specified revision. This operation is inclusive, meaning it also removes the changes from the revision. This is why you cannot request neitherHashKVnorRangeor revision that was compacted. This change would make it possible to get Hash for revision that is no longer accessible rest of API.
The intention here was actually to make the HashByRev API consistent with Range, which I believe is what the author of #14325 meant in https://github.com/etcd-io/etcd/issues/14325#issuecomment-1210717339 when he/she said:
It's a surprise to me that after compaction, the latest revision is not available to be queried at. Does that seem like the expected behavior? Based on testing with just etcdctl that doesn't seem to be the case.
I believe the following test illustrates this point. Start with a fresh etcd cluster:
> ./bin/etcdctl put foo bar1
> ./bin/etcdctl put foo bar2
> ./bin/etcdctl put foo bar3
> ./bin/etcdctl put foo bar4
> ./bin/etcdctl put foo bar5
>. /tools/etcd-dump-db/etcd-dump-db iterate-bucket ~/projects/etcd/data/uniquenode.etcd key --decode
rev={main:6 sub:0}, value=[key "foo" | val "bar5" | created 2 | mod 6 | ver 5]
rev={main:5 sub:0}, value=[key "foo" | val "bar4" | created 2 | mod 5 | ver 4]
rev={main:4 sub:0}, value=[key "foo" | val "bar3" | created 2 | mod 4 | ver 3]
rev={main:3 sub:0}, value=[key "foo" | val "bar2" | created 2 | mod 3 | ver 2]
rev={main:2 sub:0}, value=[key "foo" | val "bar1" | created 2 | mod 2 | ver 1]
Then compact it at revision 5:
> ./bin/etcdctl compact 5
compacted revision 5
./tools/etcd-dump-db/etcd-dump-db iterate-bucket ~/projects/etcd/data/uniquenode.etcd key --decode
rev={main:6 sub:0}, value=[key "foo" | val "bar5" | created 2 | mod 6 | ver 5]
rev={main:5 sub:0}, value=[key "foo" | val "bar4" | created 2 | mod 5 | ver 4]
Then check what revisions are accessible with Range:
> ./bin/etcdctl --rev=4 get foo
{"level":"warn","ts":"2022-09-18T16:19:51.102+1000","logger":"etcd-client","caller":"v3/retry_interceptor.go:64",
"msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00000a780/127.0.0.1:2379",
"method":"/etcdserverpb.KV/Range","attempt":0,"error":"rpc error: code = OutOfRange
desc = etcdserver: mvcc: required revision has been compacted"}
Error: etcdserver: mvcc: required revision has been compacted
> ./bin/etcdctl --rev=5 get foo
foo
bar4
> ./bin/etcdctl --rev=6 get foo
foo
bar5
As the version at which the compaction is done (5) is still available, it made sense to me that HashByRev raising an "required revision has been compacted" error when called on the revision, was indeed an issue.
This inconsistency is what we can see when we compare this for the HashByRev API: https://github.com/etcd-io/etcd/blob/e61d43179291562c44225bf49a6d97c52a65fdca/server/storage/mvcc/kvstore.go#L180-L182 and this for the Range API: https://github.com/etcd-io/etcd/blob/fff5d00ccfddaf5f06fa11cad65c1ecaa8c09f87/server/storage/mvcc/kvstore_txn.go#L74-L76 with the comparison being "<=" on one side, and "<" on the other side.
- Impact is negligible as provided scenario is not realistic. I don't expect that in production clusters someone will compact up to the latest revision and than don't do any updates for long enough for corruption check to run. Periodic corruption check is pretty heavy operation so I also don't expect anyone to set it for 1 minute period. For example proposed period was 12h https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-4.0.md#etcd-server
Fully agree that this is a rare case.
- Periodic check is known to be not reliable. We have already implemented a replacement. etcd detects data corruption by default #14039. I think we can consider deprecating the old check when we have enough production data.
Fair enough.
@serathius : would you suggest we simply close the PR and associated issue, please ?
The intention here was actually to make the HashByRev API consistent with Range
Sorry for mistaking it there. I think there is some value in making API consistent, however from top of my head I don't know if this change doesn't negatively impact any of existing mechanisms using HashKV. @ahrtr what's your opinion?
The intention here was actually to make the HashByRev API consistent with Range
Sorry for mistaking it there. I think there is some value in making API consistent, however from top of my head I don't know if this change doesn't negatively impact any of existing mechanisms using HashKV. @ahrtr what's your opinion?
I will have a closer look at this PR and related issue late today or tomorrow morning my time.
I think this is a valid PR.
When the leader performs the periodic check , it calculates the hash up to the currentRev. The reason why it can pass the check below is that the rev is 0.
https://github.com/etcd-io/etcd/blob/54f9483e725f7300440fcabd9fbc4b94c81da3bd/server/storage/mvcc/kvstore.go#L180
But when leader uses the same currentRev to get peers' hashes, it fails to pass the same check above, because the rev isn't 0 anymore. It doesn't make sense. It's exactly what this PR tries to fix.
Please note that when I review this PR, I found a related issue, please see https://github.com/etcd-io/etcd/pull/14510
Overall looks good to me, and I don't see any negative impact.
I don't know if this change doesn't negatively impact any of existing mechanisms using HashKV
Hello @serathius , Do you mind reviewing the PR when time permits please ?
cc @spzala @ptabor @serathius @mitake
@jbml Could you backport this PR to 3.5 and 3.4 if needed?
I found this still do not backport to v3.4 and v3.5. I would help to do this.