etcd
etcd copied to clipboard
HashKV should compute the hash of all MVCC keys up to a given revision
Bug report criteria
- [X] This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
- [X] This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
- [X] You have read the etcd bug reporting guidelines.
- [X] Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.
What happened?
Based on https://etcd.io/docs/v3.5/dev-guide/api_reference_v3/, HashKV api should compute the hash of all MVCC keys up to a given revision, even if the key/value is tombstone. However, HashKV could skip compacted revision.
https://github.com/etcd-io/etcd/blob/e7f572914d79f0705b3dc8ca28d9a14b0f854d49/server/storage/mvcc/hash.go#L60-L76
For the compacted revision, the lower.GreaterThan(kr) always return true.
Assume that the compacted revision is for key A. If the key A has revisions higher than compacted revision, the HashKV won't compute the hash on compacted revision.
Case 1: ETCD contains more than two keys
For example, there are two keys in store.
// key: "foo"
// modified: 9
// generations:
// {{9, 0}[1]}
// {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
// {{2, 0}, {3, 0}, {4, 0}(t)[3]}
// key: "foo2"
// modified: 2
// generations:
// {{2, 1}[1]}
After compaction on Revision{Main: 5}, the store will be like
// key: "foo"
// modified: 9
// generations:
// {{9, 0}[1]}
// {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
// key: "foo2"
// modified: 2
// generations:
// {{2, 1}[1]}
When we perform HashKV on Revision{Main: 7}, the hash should compute the following revisions:
- Revision{Main: 2, Sub: 1}
- Revision{Main: 5, Sub: 0}
- Revision{Main: 6, Sub: 0}
- Revision{Main: 7, Sub: 0}
However, HashKV skips Revision{Main: 5} because the compacted revision is {Main: 5} and kvindex.Keep returns two revisions {Main: 2, Sub: 1} and {Main: 7}, The Revision{Main: 5} isn't in result of kvindex.Keep.
Case 2: ETCD contains only one key
There is key foo.
// key: "foo"
// modified: 9
// generations:
// {{9, 0}[1]}
// {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
// {{2, 0}, {3, 0}, {4, 0}(t)[3]}
After compaction on Revision{Main: 5}, the store will be like
// key: "foo"
// modified: 9
// generations:
// {{9, 0}[1]}
// {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
When we perform HashKV on Revision{Main: 7}, the hash should compute the following revisions:
- Revision{Main: 5, Sub: 0}
- Revision{Main: 6, Sub: 0}
- Revision{Main: 7, Sub: 0}
HashKV skips Revision{Main: 5} because the compacted revision is {Main: 5} and kvindex.Keep returns one revision {Main: 7}.
However, when we perform HashKV on Revision{Main: 8}, the kvindex.Keep will return empty keep because the Revision{Main: 8} is tombstone. So, the hash result will compute the following revisions, which is expectation.
- Revision{Main: 5, Sub: 0}
- Revision{Main: 6, Sub: 0}
- Revision{Main: 7, Sub: 0}
- Revision{Main: 8, Sub: 0}
What did you expect to happen?
HashKV should compute the hash of all MVCC keys up to a given revision
How can we reproduce it (as minimally and precisely as possible)?
https://github.com/fuweid/etcd/commit/a6454f2fcc13aa116086b4df907541d3942b3b00
$ cd server/storage/mvcc
$ go test -v -run TestHashKVImpactedByKeep -count=1 ./
Anything else we need to know?
The HashKV was introduced by #8263. IMO, the keep, available revision map, was used to skip the revisions which were marked deleted in on-going compaction. If there is no compaction, the keep is always empty and then HashKV will compute all the MVCC keys up to the revision.
#8263 has bug and then #8333 introduced keyIndex.Keep interface to fix bug.
However, after #8333, the non-empty keep map forces HashKV to skip compacted revision.
Etcd version (please run commands below)
- main
- release-3.5
- release-3.4
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
No response
Value return by HashKV api call till x revision should be equal to the Hash of snapshot taken upto x revision (removing the appended hash). Is this correct @fuweid ?
IMO, this is not correct as HashKV api call calculates the hash of all MVCC key-values, whereas snapshot is snapshot of etcd db which also contains cluster information, hence the hash will not be same.
Hi @ishan16696
the Hash of snapshot taken upto x revision
I don't follow this. Do you mean Hash API? If so, Hash will compute all MVCC keys, confState and other buckets in bbolt.
https://github.com/etcd-io/etcd/blob/e7f572914d79f0705b3dc8ca28d9a14b0f854d49/api/etcdserverpb/rpc.proto#L226-L238
Hi @fuweid ,
What I mean by snapshot upto x revision is:
- Suppose etcd is at
xrevision then somebody takes a snapshot of etcd using snapshot api call. https://github.com/etcd-io/etcd/blob/e7f572914d79f0705b3dc8ca28d9a14b0f854d49/api/etcdserverpb/rpc.proto#L248-L254
Than to validate the integrity of this snapshot, If I calculate the hash this snapshot taken via snapshot api call upto x revision(removing appended hash) , will it be same as value return by Hash API call ?
Anyway, my doubt is different from this issue, I have now open a separate issue now: https://github.com/etcd-io/etcd/issues/18340
Than to validate the integrity of this snapshot, If I calculate the hash this snapshot taken via snapshot api call upto x revision(removing appended snapshot)
The Snapshot uses sha256sum to compute the whole database.
https://github.com/etcd-io/etcd/blob/e7f572914d79f0705b3dc8ca28d9a14b0f854d49/server/etcdserver/api/v3rpc/maintenance.go#L145
However, both Hash and HashKV are using crc32.New(crc32.MakeTable(crc32.Castagnoli)). They are different.
will it be same as value return by Hash API call ?
Unfortunately, it's not.
Compaction is tightly coupled with HashKV, we have to consider/discuss them together. We need to get consensus on https://github.com/etcd-io/etcd/pull/18274#issuecomment-2223188506 firstly.
As mentioned in that comment, compatibility (ensure new version and old version generate the same hash value) is also a big concern. We need to leverage the cluster-level feature to resolve it.
Probably it would be better to turn that into a KEP "Consistent Compaction and HashKV". cc @ivanvc @jmhbnz @serathius @siyuanfoundation
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
closed it due to no bandwidth. feel free to reopen it if you want to carry it.