tiflow drop index did not sync to secondary when simulate secondary failure last for 3mins during add and then drop index on primary

drop index did not sync to secondary when simulate secondary failure last for 3mins during add and then drop index on primary

Open Lily2025 opened this issue 11 months ago • 2 comments

What did you do?

1、restore data for primary and secondary 2、create changefeed and set bdr role for primary and secondary 3、run sysbench on primary and secondary 4、add index and then drop index when add index finished on primary 5、 simulate secondary failure last for 3mins during add and drop index on primary chaos start time：2024-03-06 00:11:33 chaos end time：2024-03-06 00:14:33

ticdc logs: endless-ha-test-bdr-ddl-tps-7230009-1-294.tar.gz

What did you expect to see?

after recover fault, ddl can sync success

What did you see instead?

drop index did not sync to secondary after recover fault primary： img_v3_028n_16936ad7-2458-4906-9d7a-7804dbc5523g

secondary： img_v3_028n_f00e7d31-8c68-463a-85b1-020287f25b3g

Versions of the cluster

./cdc version Release Version: v8.0.0-alpha Git Commit Hash: fcd4bfa5b89d41b3de663e1966f54bb80a680fe7 Git Branch: heads/refs/tags/v8.0.0-alpha UTC Build Time: 2024-03-04 11:36:45 Go Version: go version go1.21.6 linux/amd64 Failpoint Build: false

current status of DM cluster (execute `query-status <task-name>` in dmctl)

No response

Mar 06 '24 02:03 Lily2025

/remove-area dm /area ticdc

Mar 06 '24 02:03 Lily2025

/assign asddongmen

Mar 06 '24 02:03 Lily2025

The issue arises because the CDC executes the add index operation asynchronously to prevent synchronization delay caused by the DDL. As a result, CDC cannot ensure that the previous add index has already entered the downstream DDL pending queue when executing the drop index. In this case, while executing the drop index, the downstream might assume that the index doesn't exist yet. Therefore, CDC disregards this error and assumes that the DDL can be skipped.

Mar 14 '24 06:03 asddongmen

tiflow tiflow copied to clipboard

drop index did not sync to secondary when simulate secondary failure last for 3mins during add and then drop index on primary

What did you do?

What did you expect to see?

What did you see instead?

Versions of the cluster

current status of DM cluster (execute query-status <task-name> in dmctl)

tiflow
tiflow copied to clipboard

current status of DM cluster (execute `query-status <task-name>` in dmctl)