Update EBNF for admin statements and add checksum details
What is changed, added or deleted? (Required)
-
Some pages about specific admin statements also had EBNF for other admin statements.
-
Added some info to the page about
ADMIN CHECKSUM TABLEabout what checksum can be expected the same.
Closes #15468
- Formatted the
AdminStmtEBNF to be more readable.
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
- [x] master (the latest development version)
- [x] v8.2 (TiDB 8.2 versions)
- [x] v8.1 (TiDB 8.1 versions)
- [ ] v8.0 (TiDB 8.0 versions)
- [ ] v7.6 (TiDB 7.6 versions)
- [ ] v7.5 (TiDB 7.5 versions)
- [ ] v7.1 (TiDB 7.1 versions)
- [ ] v6.5 (TiDB 6.5 versions)
- [ ] v6.1 (TiDB 6.1 versions)
- [ ] v5.4 (TiDB 5.4 versions)
- [ ] v5.3 (TiDB 5.3 versions)
- [ ] v5.2 (TiDB 5.2 versions)
- [ ] v5.1 (TiDB 5.1 versions)
What is the related PR or file link(s)?
- This PR is translated from:
- Other reference link(s):
Do your changes match any of the following descriptions?
- [ ] Delete files
- [ ] Change aliases
- [ ] Need modification after applied to another branch
- [ ] Might cause conflicts after applied to another branch
With the example from the checksum docs:
mysql> SELECT TIDB_VERSION()\G
*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v8.0.0
Edition: Community
Git Commit Hash: 43cf9a2245b91c647b46816ad3d5424ef90f1070
Git Branch: HEAD
UTC Build Time: 2024-03-27 09:37:44
GoVersion: go1.21.4
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.01 sec)
So this is v8.0.0 with TiKV (not unistore). This is on Linux with a TiUP Playground.
mysql> CREATE TABLE t1(id INT PRIMARY KEY);
Query OK, 0 rows affected (0.09 sec)
mysql> INSERT INTO t1 VALUES (1),(2),(3);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+----------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+----------------------+-----------+-------------+
| test | t1 | 17934567675667613934 | 3 | 75 |
+---------+------------+----------------------+-----------+-------------+
1 row in set (0.00 sec)
The above is what the docs have in the example, but with a different checksum.
mysql> DELETE FROM t1;
Query OK, 3 rows affected (0.01 sec)
mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| test | t1 | 0 | 0 | 0 |
+---------+------------+--------------------+-----------+-------------+
1 row in set (0.00 sec)
mysql> INSERT INTO t1 VALUES (1),(2),(3);
Query OK, 3 rows affected (0.01 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+----------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+----------------------+-----------+-------------+
| test | t1 | 17934567675667613934 | 3 | 75 |
+---------+------------+----------------------+-----------+-------------+
1 row in set (0.00 sec)
So if we remove all rows then the checksum is 0 as it isn't checksumming any rows. Then if we add the rows back then the checksum is the same again.
mysql> TRUNCATE TABLE t1;
Query OK, 0 rows affected (0.11 sec)
mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| test | t1 | 0 | 0 | 0 |
+---------+------------+--------------------+-----------+-------------+
1 row in set (0.00 sec)
mysql> INSERT INTO t1 VALUES (1),(2),(3);
Query OK, 3 rows affected (0.01 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+----------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+----------------------+-----------+-------------+
| test | t1 | 11772166126661881374 | 3 | 75 |
+---------+------------+----------------------+-----------+-------------+
1 row in set (0.00 sec)
But if we truncate the table instead of deleting the data and then add the rows back then the checksum is different. So the checksum seems to depend on the prefix with the table_id.
Note that the table has one column, which is an int, which is 4 bytes, but here each row shows 25 bytes.
Btw. Looks like this is where the checksum is calculated:
https://github.com/tikv/tikv/blob/fba1fb2f65620b7ab1c03a9cf918ef3968969398/src/coprocessor/checksum.rs#L83
I think TiDB Lightning executes this command when using a tidb backend, but isn't using the result?
With Unistore it behaves like this:
sql> ADMIN CHECKSUM TABLE t1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| test | t1 | 1 | 1 | 1 |
+---------+------------+--------------------+-----------+-------------+
1 row in set (0.0008 sec)
sql> SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
| 6 |
+----------+
1 row in set (0.0013 sec)
So the values it reports are static. Should we put a note in the docs about this?
@dveeden I believe the ADMIN CHECKSUM is simply not implemented in UniStore. However this behavior difference should not be documented, as UniStore is not documented and user will be confused.
Ref: Lightning calculates the Checksum in the same way, so that Lightning could know the integrity of the imported data.
https://github.com/pingcap/tidb/blob/8f958dec25cf100370f0e5e261aa7f43aaf3d118/pkg/lightning/verification/checksum.go#L60
Considering that this SQL is only used by TiDB Lightning and doesn't seem to provide any user benefits when it is used standalone (Table ID is counted in the Checksum so that it is actually a Physical / Internal Data Checksum), I think we may even consider hiding this document? Then user could find something more useful, with less noise.
@breezewish Some customer chooses to skip the ADMIN CHECKSUM execution in Lightning (set checksum = "off") and perform the check later manually. I'm not sure what to mean by "hiding" but it would be beneficial for the support team to still have some kind of documentation to point to about how to execute the command.
@kennytm: adding LGTM is restricted to approvers and reviewers in OWNERS files.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Thanks @kennytm @breezewish
-
I agree that we don't need to document this for Unistore.
-
I have updated the
ADMIN CHECKSUM TABLEtext to better reflect what TiDB Lightning does. PTAL. I hope this makes it more clear to users thatADMIN CHECKSUM TABLEis linked to TiDB Lightning and not a replacement forCHECKSUM TABLEin MySQL. -
It looks like we could modify this in TiKV/TiDB to only checksum the table data, providing a more stable checksum that isn't influenced by
table_idetc. This could maybe be used to have something similar toCHECKSUM TABLE, even if the values are not the same as MySQL (different algorithm?). Note that MySQL checksum is dependent on the row format etc, so users are probably not expecting this to be very stable. Note that I would suggest to add this functionality instead of replacing or chaningADMIN CHECKSUM TABLE. This would implement https://github.com/pingcap/tidb/issues/1895 If we do this we might want to implement per-partition checksum (cc @mjonss )
[LGTM Timeline notifier]
Timeline:
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: qiancai
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [qiancai]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/retest
In response to a cherrypick label: new pull request created to branch release-8.1: #18156.
In response to a cherrypick label: new pull request created to branch release-8.2: #18157.