matrixone [Bug]: restore 2T data from snapshot report 'table does not exist'.

Is there an existing issue for the same bug?

[X] I have checked the existing issues.

Branch Name

main

Commit ID

cf5296b

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

恢复了大约5h后报错 table does not exist

mysql> restore account sys from snapshot sp01;


ERROR 1064 (HY000): SQL parser error: table "table_with_pk_index_for_write_1b" does not exist

日志：https://grafana.ci.matrixorigin.cn/explore?panes=%7B%223wf%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20241016%5C%22%7D%20%7C%3D%20%60sp01%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221729137600000%22,%22to%22:%221729155600000%22%7D%7D%7D&schemaVersion=1&orgId=1

快照读能读到数据：

Expected Behavior

No response

Steps to Reproduce

步骤：
create snapshot sp01 for account sys ;
drop database big_data_test;
restore account sys from snapshot sp01;

big_data_test中有28张表，数据量大约有2T，找不到的表table_with_pk_index_for_write_1b的schema：
create table if not exists big_data_test.table_with_pk_index_for_write_1B( id bigint primary key, col1 tinyint, col2 smallint, col3 int, col4 bigint, col5 tinyint unsigned, col6 smallint unsigned, col7 int unsigned, col8 bigint unsigned, col9 float, col10 double, col11 varchar(255), col12 Date, col13 DateTime, col14 timestamp, col15 bool, col16 decimal(16,6), col17 text, col18 json, col19 blob, col20 binary(255), col21 varbinary(255), col22 vecf32(3), col23 vecf32(3), col24 vecf64(3), col25 vecf64(3));

Additional information

No response

Oct 17 '24 09:10 Ariznawlll

Oct 17 '24 09:10 YANGGMM

原因已大致定位到.

Oct 18 '24 11:10 triump2020

今天恢复也有这个问题：步骤与issue中提到的基本一样

mysql> select git_version();
+---------------+
| git_version() |
+---------------+
| 29a0c5d       |
+---------------+
1 row in set (0.00 sec)

企业微信截图_31fc874c-78f7-469c-a56b-0ecef8a03476

快照读能读到数据：

log：https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22-Ur%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20241017%5C%22%7D%20%7C%3D%20%60txn%20is%20stale%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221729256400000%22,%22to%22:%221729260000000%22%7D%7D%7D&schemaVersion=1&orgId=1

Oct 18 '24 14:10 Ariznawlll

PR is on the way!

Oct 19 '24 09:10 triump2020

PR 可能只解决了，导致这个问题的原因之一，但如果概率比较大，可能还有其他原因，需要加日志再复现下.

Oct 22 '24 10:10 triump2020

下午根据pitr恢复2T数据也报该错误

commit: 8d7e7b8
恢复执行的sql： restore from pitr p01 "2024-10-22 03:24:18.547701"

log：https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22MBn%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20241021%5C%22%7D%20%7C%3D%20%60txn%20is%20stale%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221729587600000%22,%22to%22:%221729591200000%22%7D%7D%7D&schemaVersion=1&orgId=1

Oct 22 '24 11:10 Ariznawlll

又完善了Log ，线上，线下同时在复现。应该是其他原因，导致了这个问题.

Oct 23 '24 11:10 triump2020

复现步骤：

修改以下配置： [tn.Ckp] flush-interval = "5s" min-count = 1 scan-interval = "5s" incremental-interval = "10s" global-min-count = 3
修改程序: gcPartitionStateTicker = 5 * time.Second gcPartitionStateTimer = 90 * time.Second

1729753427198

运行mo-service
运行 sql: 1>create table tpcc_1000.bmsql_order_line

2>load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com','access_key_id'='','secret_access_key'='','bucket'='mo-load-guangzhou-1308875761','filepath'='tpcc_1000/order-line.csv', 'compression'=''} into table tpcc_1000.bmsql_order_line fields terminated by ',' lines terminated by '\n' parallel 'true';

3> create snapshot 1; 4> drop database tpcc_1000; 5> restore account sys from snapshot sp01;

Oct 24 '24 07:10 triump2020

原因已定位，等待修复. 是 txn is stale 的错误，导致了报表找不到的错误. txn is stale 的原因是 partition state 的 minTs, start, end 的数据不一致导致.

Oct 24 '24 07:10 triump2020

又完善了Log ，线上，线下同时在复现。应该是其他原因，导致了这个问题.

经过线下复现，原因就是第一个pr 所修复的，只是修复失败.

Oct 24 '24 07:10 triump2020

由Txn is stale 导致的 table not found 问题应该修复了，线下测试过好多次了. @Ariznawlll 请测试.

Oct 24 '24 12:10 triump2020

等待pr 合并

Oct 25 '24 13:10 triump2020

fixed

Oct 27 '24 03:10 XuPeng-SH

测试中

Oct 28 '24 02:10 Ariznawlll

用飞哥的复现方式没有出现，测试步骤

复现步骤：
mysql> select git_version();
+---------------+
| git_version() |
+---------------+
| 700ee56cf     |
+---------------+
1 row in set (0.00 sec)

修改 以下配置：
[tn.Ckp]
flush-interval = "5s"
min-count = 1
scan-interval = "5s"
incremental-interval = "10s"
global-min-count = 3

修改程序:
gcPartitionStateTicker = 5 * time.Second
gcPartitionStateTimer = 90 * time.Second

mysql> create table bmsql_order_line (
    ->   ol_w_id         integer   not null,
    ->   ol_d_id         integer   not null,
    ->   ol_o_id         integer   not null,
    ->   ol_number       integer   not null,
    ->   ol_i_id         integer   not null,
    ->   ol_delivery_d   timestamp,
    ->   ol_amount       decimal(6,2),
    ->   ol_supply_w_id  integer,
    ->   ol_quantity     integer,
    ->   ol_dist_info    char(24),
    ->   primary key (ol_w_id, ol_d_id, ol_o_id, ol_number)
    -> ) ;
Query OK, 0 rows affected (0.07 sec)

mysql> load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761','filepath'='tpcc_1000/order-line.csv', 'compression'=''} into table tpcc_1000.bmsql_order_line fields terminated by ',' lines terminated by '\n' parallel 'true';
Query OK, 300014864 rows affected (17 min 14.86 sec)

mysql> create snapshot sp01 for account sys;
Query OK, 0 rows affected (0.06 sec)

mysql> drop database tpcc_1000;
Query OK, 1 row affected (0.11 sec)

mysql> restore account sys from snapshot sp01;
Query OK, 0 rows affected (1 min 55.28 sec)

Oct 28 '24 06:10 Ariznawlll

整个集群大约有2T数据，恢复集群测试：

main commit：967435e

企业微信截图_e39c0d0d-35b9-4e25-8a29-d9bc6fe1fb30

恢复执行的sql：restore cluster from snapshot cluster_sp;

log：https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22tFc%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-snapshot-test-20241027%5C%22%7D%20%7C%3D%20%60table_with_pk_for_insert_100m%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221730183598993%22,%22to%22:%221730183688383%22%7D%7D%7D&schemaVersion=1&orgId=1

回归不通过

Oct 29 '24 07:10 Ariznawlll

恢复account级别，account的数据量大约2T

mysql> select git_version(); +---------------+ | git_version() | +---------------+ | 5c6909f | +---------------+ 1 row in set (0.00 sec)

企业微信截图_215120b5-f2a2-4cf1-be93-e63155772535

恢复成功

Oct 29 '24 08:10 Ariznawlll

测试结论：

集群级别的恢复失败，整个集群的数据约2T；
account级别的恢复成功，account的数据约2T.

Oct 29 '24 08:10 Ariznawlll

@Ariznawlll Pls test!

Oct 30 '24 03:10 triump2020

测试中

Oct 30 '24 10:10 Ariznawlll

1.66T集群恢复成功

Nov 01 '24 00:11 Ariznawlll

2.37T数据量集群恢复成功

Nov 02 '24 15:11 Ariznawlll