dumpling
dumpling copied to clipboard
add checksum to make sure dumped data is correct
Feature Request
Is your feature request related to a problem? Please describe:
Currently, we don't know whether dumped data is correct because we don't have a checksum mechanism now.
Describe the feature you'd like:
Add checksum to make sure the dumped data is correct.
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Optimization:
First we should record the SHA-256 + size of each file and COUNT(*) of each table somewhere. I suggest we do not reuse the metadata file, its format is not good for verification.
Then, we do the actual checksum of data.
MySQL has the CHECKSUM TABLE statement but this is not supported by TiDB (pingcap/tidb#1895). Furthermore, MySQL's CHECKSUM TABLE is not guaranteed to be stable, nor is the checksum method explicitly documented. So let's ignore this feature.
We could reuse sync-diff-inspector's CRC32 checksum from https://github.com/pingcap/tidb-tools/blob/0297393b93b9dbc57fc07a17c898dd621467ef7f/pkg/dbutil/common.go#L373, but we better change the function signature to not take a *model.TableInfo 😏.