etcd icon indicating copy to clipboard operation
etcd copied to clipboard

HashKV should compute the hash of all MVCC keys up to a given revision

Open fuweid opened this issue 1 year ago • 6 comments

Bug report criteria

  • [X] This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
  • [X] This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
  • [X] You have read the etcd bug reporting guidelines.
  • [X] Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

Based on https://etcd.io/docs/v3.5/dev-guide/api_reference_v3/, HashKV api should compute the hash of all MVCC keys up to a given revision, even if the key/value is tombstone. However, HashKV could skip compacted revision.

https://github.com/etcd-io/etcd/blob/e7f572914d79f0705b3dc8ca28d9a14b0f854d49/server/storage/mvcc/hash.go#L60-L76

For the compacted revision, the lower.GreaterThan(kr) always return true. Assume that the compacted revision is for key A. If the key A has revisions higher than compacted revision, the HashKV won't compute the hash on compacted revision.

Case 1: ETCD contains more than two keys

For example, there are two keys in store.

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
//    {{2, 0}, {3, 0}, {4, 0}(t)[3]}

// key: "foo2"
// modified: 2
// generations:
//    {{2, 1}[1]}

After compaction on Revision{Main: 5}, the store will be like

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}

// key: "foo2"
// modified: 2
// generations:
//    {{2, 1}[1]}

When we perform HashKV on Revision{Main: 7}, the hash should compute the following revisions:

  • Revision{Main: 2, Sub: 1}
  • Revision{Main: 5, Sub: 0}
  • Revision{Main: 6, Sub: 0}
  • Revision{Main: 7, Sub: 0}

However, HashKV skips Revision{Main: 5} because the compacted revision is {Main: 5} and kvindex.Keep returns two revisions {Main: 2, Sub: 1} and {Main: 7}, The Revision{Main: 5} isn't in result of kvindex.Keep.

Case 2: ETCD contains only one key

There is key foo.

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
//    {{2, 0}, {3, 0}, {4, 0}(t)[3]}

After compaction on Revision{Main: 5}, the store will be like

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}

When we perform HashKV on Revision{Main: 7}, the hash should compute the following revisions:

  • Revision{Main: 5, Sub: 0}
  • Revision{Main: 6, Sub: 0}
  • Revision{Main: 7, Sub: 0}

HashKV skips Revision{Main: 5} because the compacted revision is {Main: 5} and kvindex.Keep returns one revision {Main: 7}.

However, when we perform HashKV on Revision{Main: 8}, the kvindex.Keep will return empty keep because the Revision{Main: 8} is tombstone. So, the hash result will compute the following revisions, which is expectation.

  • Revision{Main: 5, Sub: 0}
  • Revision{Main: 6, Sub: 0}
  • Revision{Main: 7, Sub: 0}
  • Revision{Main: 8, Sub: 0}

What did you expect to happen?

HashKV should compute the hash of all MVCC keys up to a given revision

How can we reproduce it (as minimally and precisely as possible)?

https://github.com/fuweid/etcd/commit/a6454f2fcc13aa116086b4df907541d3942b3b00

$ cd server/storage/mvcc
$ go test -v -run TestHashKVImpactedByKeep -count=1 ./

Anything else we need to know?

The HashKV was introduced by #8263. IMO, the keep, available revision map, was used to skip the revisions which were marked deleted in on-going compaction. If there is no compaction, the keep is always empty and then HashKV will compute all the MVCC keys up to the revision.

#8263 has bug and then #8333 introduced keyIndex.Keep interface to fix bug. However, after #8333, the non-empty keep map forces HashKV to skip compacted revision.

Etcd version (please run commands below)

  • main
  • release-3.5
  • release-3.4

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

fuweid avatar Jul 16 '24 13:07 fuweid

Value return by HashKV api call till x revision should be equal to the Hash of snapshot taken upto x revision (removing the appended hash). Is this correct @fuweid ?

IMO, this is not correct as HashKV api call calculates the hash of all MVCC key-values, whereas snapshot is snapshot of etcd db which also contains cluster information, hence the hash will not be same.

ishan16696 avatar Jul 17 '24 06:07 ishan16696

Hi @ishan16696

the Hash of snapshot taken upto x revision

I don't follow this. Do you mean Hash API? If so, Hash will compute all MVCC keys, confState and other buckets in bbolt.

https://github.com/etcd-io/etcd/blob/e7f572914d79f0705b3dc8ca28d9a14b0f854d49/api/etcdserverpb/rpc.proto#L226-L238

fuweid avatar Jul 18 '24 01:07 fuweid

Hi @fuweid , What I mean by snapshot upto x revision is:

  • Suppose etcd is at x revision then somebody takes a snapshot of etcd using snapshot api call. https://github.com/etcd-io/etcd/blob/e7f572914d79f0705b3dc8ca28d9a14b0f854d49/api/etcdserverpb/rpc.proto#L248-L254

Than to validate the integrity of this snapshot, If I calculate the hash this snapshot taken via snapshot api call upto x revision(removing appended hash) , will it be same as value return by Hash API call ?

ishan16696 avatar Jul 18 '24 07:07 ishan16696

Anyway, my doubt is different from this issue, I have now open a separate issue now: https://github.com/etcd-io/etcd/issues/18340

ishan16696 avatar Jul 18 '24 07:07 ishan16696

Than to validate the integrity of this snapshot, If I calculate the hash this snapshot taken via snapshot api call upto x revision(removing appended snapshot)

The Snapshot uses sha256sum to compute the whole database.

https://github.com/etcd-io/etcd/blob/e7f572914d79f0705b3dc8ca28d9a14b0f854d49/server/etcdserver/api/v3rpc/maintenance.go#L145

However, both Hash and HashKV are using crc32.New(crc32.MakeTable(crc32.Castagnoli)). They are different.

will it be same as value return by Hash API call ?

Unfortunately, it's not.

fuweid avatar Jul 18 '24 09:07 fuweid

Compaction is tightly coupled with HashKV, we have to consider/discuss them together. We need to get consensus on https://github.com/etcd-io/etcd/pull/18274#issuecomment-2223188506 firstly.

As mentioned in that comment, compatibility (ensure new version and old version generate the same hash value) is also a big concern. We need to leverage the cluster-level feature to resolve it.

Probably it would be better to turn that into a KEP "Consistent Compaction and HashKV". cc @ivanvc @jmhbnz @serathius @siyuanfoundation

ahrtr avatar Jul 19 '24 11:07 ahrtr

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 26 '25 11:04 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jun 28 '25 00:06 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Aug 28 '25 00:08 github-actions[bot]

closed it due to no bandwidth. feel free to reopen it if you want to carry it.

fuweid avatar Aug 28 '25 15:08 fuweid