docs icon indicating copy to clipboard operation
docs copied to clipboard

Update EBNF for admin statements and add checksum details

Open dveeden opened this issue 1 year ago • 9 comments

What is changed, added or deleted? (Required)

  1. Some pages about specific admin statements also had EBNF for other admin statements.

  2. Added some info to the page about ADMIN CHECKSUM TABLE about what checksum can be expected the same.

Closes #15468

  1. Formatted the AdminStmt EBNF to be more readable.

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • [x] master (the latest development version)
  • [x] v8.2 (TiDB 8.2 versions)
  • [x] v8.1 (TiDB 8.1 versions)
  • [ ] v8.0 (TiDB 8.0 versions)
  • [ ] v7.6 (TiDB 7.6 versions)
  • [ ] v7.5 (TiDB 7.5 versions)
  • [ ] v7.1 (TiDB 7.1 versions)
  • [ ] v6.5 (TiDB 6.5 versions)
  • [ ] v6.1 (TiDB 6.1 versions)
  • [ ] v5.4 (TiDB 5.4 versions)
  • [ ] v5.3 (TiDB 5.3 versions)
  • [ ] v5.2 (TiDB 5.2 versions)
  • [ ] v5.1 (TiDB 5.1 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • [ ] Delete files
  • [ ] Change aliases
  • [ ] Need modification after applied to another branch
  • [ ] Might cause conflicts after applied to another branch

dveeden avatar May 24 '24 13:05 dveeden

With the example from the checksum docs:

mysql> SELECT TIDB_VERSION()\G
*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v8.0.0
Edition: Community
Git Commit Hash: 43cf9a2245b91c647b46816ad3d5424ef90f1070
Git Branch: HEAD
UTC Build Time: 2024-03-27 09:37:44
GoVersion: go1.21.4
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.01 sec)

So this is v8.0.0 with TiKV (not unistore). This is on Linux with a TiUP Playground.

mysql> CREATE TABLE t1(id INT PRIMARY KEY);
Query OK, 0 rows affected (0.09 sec)

mysql> INSERT INTO t1 VALUES (1),(2),(3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+----------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor   | Total_kvs | Total_bytes |
+---------+------------+----------------------+-----------+-------------+
| test    | t1         | 17934567675667613934 |         3 |          75 |
+---------+------------+----------------------+-----------+-------------+
1 row in set (0.00 sec)

The above is what the docs have in the example, but with a different checksum.

mysql> DELETE FROM t1;
Query OK, 3 rows affected (0.01 sec)

mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| test    | t1         |                  0 |         0 |           0 |
+---------+------------+--------------------+-----------+-------------+
1 row in set (0.00 sec)

mysql> INSERT INTO t1 VALUES (1),(2),(3);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+----------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor   | Total_kvs | Total_bytes |
+---------+------------+----------------------+-----------+-------------+
| test    | t1         | 17934567675667613934 |         3 |          75 |
+---------+------------+----------------------+-----------+-------------+
1 row in set (0.00 sec)

So if we remove all rows then the checksum is 0 as it isn't checksumming any rows. Then if we add the rows back then the checksum is the same again.

mysql> TRUNCATE TABLE t1;
Query OK, 0 rows affected (0.11 sec)

mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| test    | t1         |                  0 |         0 |           0 |
+---------+------------+--------------------+-----------+-------------+
1 row in set (0.00 sec)

mysql> INSERT INTO t1 VALUES (1),(2),(3);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> ADMIN CHECKSUM TABLE t1;
+---------+------------+----------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor   | Total_kvs | Total_bytes |
+---------+------------+----------------------+-----------+-------------+
| test    | t1         | 11772166126661881374 |         3 |          75 |
+---------+------------+----------------------+-----------+-------------+
1 row in set (0.00 sec)

But if we truncate the table instead of deleting the data and then add the rows back then the checksum is different. So the checksum seems to depend on the prefix with the table_id.

Note that the table has one column, which is an int, which is 4 bytes, but here each row shows 25 bytes.

dveeden avatar May 24 '24 14:05 dveeden

Btw. Looks like this is where the checksum is calculated:

https://github.com/tikv/tikv/blob/fba1fb2f65620b7ab1c03a9cf918ef3968969398/src/coprocessor/checksum.rs#L83

dveeden avatar May 24 '24 14:05 dveeden

I think TiDB Lightning executes this command when using a tidb backend, but isn't using the result?

dveeden avatar May 24 '24 14:05 dveeden

With Unistore it behaves like this:

sql> ADMIN CHECKSUM TABLE t1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| test    | t1         |                  1 |         1 |           1 |
+---------+------------+--------------------+-----------+-------------+
1 row in set (0.0008 sec)

sql> SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|        6 |
+----------+
1 row in set (0.0013 sec)

So the values it reports are static. Should we put a note in the docs about this?

dveeden avatar May 24 '24 14:05 dveeden

@dveeden I believe the ADMIN CHECKSUM is simply not implemented in UniStore. However this behavior difference should not be documented, as UniStore is not documented and user will be confused.

Ref: Lightning calculates the Checksum in the same way, so that Lightning could know the integrity of the imported data.

https://github.com/pingcap/tidb/blob/8f958dec25cf100370f0e5e261aa7f43aaf3d118/pkg/lightning/verification/checksum.go#L60

Considering that this SQL is only used by TiDB Lightning and doesn't seem to provide any user benefits when it is used standalone (Table ID is counted in the Checksum so that it is actually a Physical / Internal Data Checksum), I think we may even consider hiding this document? Then user could find something more useful, with less noise.

breezewish avatar May 24 '24 14:05 breezewish

@breezewish Some customer chooses to skip the ADMIN CHECKSUM execution in Lightning (set checksum = "off") and perform the check later manually. I'm not sure what to mean by "hiding" but it would be beneficial for the support team to still have some kind of documentation to point to about how to execute the command.

kennytm avatar May 25 '24 03:05 kennytm

@kennytm: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ti-chi-bot[bot] avatar May 25 '24 03:05 ti-chi-bot[bot]

Thanks @kennytm @breezewish

  1. I agree that we don't need to document this for Unistore.

  2. I have updated the ADMIN CHECKSUM TABLE text to better reflect what TiDB Lightning does. PTAL. I hope this makes it more clear to users that ADMIN CHECKSUM TABLE is linked to TiDB Lightning and not a replacement for CHECKSUM TABLE in MySQL.

  3. It looks like we could modify this in TiKV/TiDB to only checksum the table data, providing a more stable checksum that isn't influenced by table_id etc. This could maybe be used to have something similar to CHECKSUM TABLE, even if the values are not the same as MySQL (different algorithm?). Note that MySQL checksum is dependent on the row format etc, so users are probably not expecting this to be very stable. Note that I would suggest to add this functionality instead of replacing or chaning ADMIN CHECKSUM TABLE. This would implement https://github.com/pingcap/tidb/issues/1895 If we do this we might want to implement per-partition checksum (cc @mjonss )

dveeden avatar May 27 '24 08:05 dveeden

[LGTM Timeline notifier]

Timeline:

  • 2024-07-05 07:39:24.531506854 +0000 UTC m=+1569291.016995688: :ballot_box_with_check: agreed by qiancai.
  • 2024-07-09 06:29:55.561260773 +0000 UTC m=+339092.796494884: :ballot_box_with_check: agreed by hfxsd.

ti-chi-bot[bot] avatar Jul 09 '24 06:07 ti-chi-bot[bot]

/approve

qiancai avatar Jul 09 '24 10:07 qiancai

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qiancai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot[bot] avatar Jul 09 '24 10:07 ti-chi-bot[bot]

/retest

qiancai avatar Jul 09 '24 10:07 qiancai

In response to a cherrypick label: new pull request created to branch release-8.1: #18156.

ti-chi-bot avatar Jul 09 '24 10:07 ti-chi-bot

In response to a cherrypick label: new pull request created to branch release-8.2: #18157.

ti-chi-bot avatar Jul 09 '24 10:07 ti-chi-bot