cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

kv: verify roachpb.Value checksums when building snapshots

Open lyang24 opened this issue 1 year ago • 2 comments

This commit verify record checksum verification when processing raft snapshot records. This process prevents data corruption spread via snapshot process. Cluster setting snapshotChecksumVerification is introduced to act as a safeguard knob to turn on/off the verifcation.

Fixes: https://github.com/cockroachdb/cockroach/issues/110572

Release note: None

lyang24 avatar Feb 18 '24 07:02 lyang24

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

I have added a few people who may be able to assist in reviewing:

  • @sumeerbhola (commented on cockroachdb/cockroach#110572)
  • @jbowens (commented on cockroachdb/cockroach#110572)
  • @erikgrinaker (author of cockroachdb/cockroach#110572)

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] avatar Feb 18 '24 07:02 blathers-crl[bot]

This change is Reviewable

cockroach-teamcity avatar Feb 18 '24 07:02 cockroach-teamcity

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] avatar Feb 24 '24 20:02 blathers-crl[bot]

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] avatar Feb 24 '24 21:02 blathers-crl[bot]

I digged more into the test failure with @pav-kv, we added prints to mvccput and found the orginal value being put is a SimpleMVCCValue

key: /Local/RangeID/68/r/RangeGCThreshold
value: [7a 00 c9 92 03]
proto: 0,0
checksum: 7a00c992
computed checksum: 7a00c992

the value matches the suffix of the error log in the checksum verification [12 04 08 00 10 00 18 00 20 00 28 00 32 05 7a 00 c9 92 03] which leads us to think it become an ExtendedMVCCValue in the raft snaphot.

The confusion part is decode simple mvcc value would work on the [12 04 08 00 10 00 18 00 20 00 28 00 32 05 7a 00 c9 92 03] value and 12 04 08 00 leads to wrong checksum. I am expecting it to return false and redirect to decodeExtendedMVCCValue(r.Value()). I think Erik is ooo, hi @jbowens do you know a better way to decode the value from batchreader?

lyang24 avatar Feb 27 '24 20:02 lyang24

Previous thread is resolved, I wrapped decoding logic under the if mvccKey.IsValue() just like here test passes, I guess mvcckey without a timestamp is another category of mvccdata.

	if mvccKey.IsValue() {
		mvccValue, ok, err := tryDecodeSimpleMVCCValue(r.Value())
		if !ok && err == nil {
			mvccValue, err = decodeExtendedMVCCValue(r.Value())
		}
		if err == nil {
			err = mvccValue.Value.Verify(mvccKey.Key)
		}
	}

lyang24 avatar Feb 27 '24 20:02 lyang24

pkg/kv/kvserver/store_snapshot.go line 555 at r7 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

A bunch of func calls below (e.g. PutInternalRangeKey) will decode the EngineKey internally. I wonder if the APIs should accept EngineKey instead of the raw one. Probably not to be addressed in PR.

left a todo on this, personally I think all the method below can take a decoded engine key as input

lyang24 avatar Mar 25 '24 22:03 lyang24

unfortunately this pr experienced the same test failures will wait for the fix of 121342 then rebase and merge https://github.com/cockroachdb/cockroach/issues/121342

lyang24 avatar Mar 29 '24 18:03 lyang24

bors r+

lyang24 avatar Mar 30 '24 19:03 lyang24