dumpling icon indicating copy to clipboard operation
dumpling copied to clipboard

add checksum to make sure dumped data is correct

Open lichunzhu opened this issue 4 years ago • 1 comments

Feature Request

Is your feature request related to a problem? Please describe:

Currently, we don't know whether dumped data is correct because we don't have a checksum mechanism now.

Describe the feature you'd like:

Add checksum to make sure the dumped data is correct.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Optimization:

lichunzhu avatar Mar 23 '21 10:03 lichunzhu

First we should record the SHA-256 + size of each file and COUNT(*) of each table somewhere. I suggest we do not reuse the metadata file, its format is not good for verification.

Then, we do the actual checksum of data.

MySQL has the CHECKSUM TABLE statement but this is not supported by TiDB (pingcap/tidb#1895). Furthermore, MySQL's CHECKSUM TABLE is not guaranteed to be stable, nor is the checksum method explicitly documented. So let's ignore this feature.

We could reuse sync-diff-inspector's CRC32 checksum from https://github.com/pingcap/tidb-tools/blob/0297393b93b9dbc57fc07a17c898dd621467ef7f/pkg/dbutil/common.go#L373, but we better change the function signature to not take a *model.TableInfo 😏.

kennytm avatar Mar 24 '21 07:03 kennytm