tiflow CDC initial scan are not parallel well

What did you do?

Deploy TiDB cluster with 2 CDC nodes. TiKV configuration cdc.incremental-fetch-speed-limit = 300MiB, dc.incremental-scan-speed-limit = 96MiB.
Start cdc changefeed and pause it.

[root@tc-ticdc-0 /]# /cdc cli changefeed --server http://127.0.0.1:8301 query -c test1
{
  "upstream_id": 7301604617795285762,
  "namespace": "default",
  "id": "test1",
  "sink_uri": "blackhole:",
  "config": {
    "memory_quota": 1073741824,
    "case_sensitive": true,
    "force_replicate": false,
    "ignore_ineligible_table": false,
    "check_gc_safe_point": true,
    "filter": {
      "rules": [
        "*.*"
      ]
    },
    "mounter": {
      "worker_num": 16
    },
    "sink": {
      "protocol": "",
      "transaction_atomicity": "",
      "terminator": "\r\n",
      "delete_only_output_handle_key_columns": null
    },
    "scheduler": {
      "enable_table_across_nodes": false,
      "region_threshold": 100000,
      "write_key_threshold": 0
    },
    "integrity": {
      "integrity_check_level": "none",
      "corruption_handle_level": "warn"
    },
    "changefeed_error_stuck_duration": 1800000000000,
    "sql_mode": "ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
  },
  "create_time": "2023-11-15 18:14:51.806",
  "start_ts": 445718200320000000,
  "resolved_ts": 445768313518424347,
  "target_ts": 0,
  "checkpoint_tso": 445733430529884442,
  "checkpoint_time": "2023-11-19 04:08:18.640",
  "state": "normal",
  "creator_version": "v6.5.3",
"task_status": [
    {
      "capture_id": "11512494-c874-43ed-b111-831962ed47ca",
      "table_ids": [
        84,
        88,
        91,
        94
      ]
    },
    {
      "capture_id": "f5cf9c55-47ec-4c4c-8527-010eccf9a8e5",
      "table_ids": [
        90,
        93,
        96,
        86
      ]
    }
  ]
}

Run tpcc workload for e.g. 24h to create lots of logs to be scanned by cdc
Rerun cdc changefeed

What did you expect to see?

Both CDC node should have initial scan workloads at the same time.

What did you see instead?

For the first half if initial scan cdc-0 has lots of workload, while for the rest, cdc-1 has lots of workload. This resulted in a long initial scan time.

Versions of the cluster

[root@tc-ticdc-0 /]# /cdc version Release Version: v7.5.0 Git Commit Hash: 99c1f8fdffe72f2a9dbce6d0b58a52a162ce72b7 Git Branch: heads/refs/tags/v7.5.0 UTC Build Time: 2023-11-16 10:33:24 Go Version: go version go1.21.3 linux/amd64 Failpoint Build: false

Nov 20 '23 11:11 fubinzh

/severity moderate

Nov 20 '23 11:11 fubinzh

It should not be considered a bug.

May 21 '24 03:05 asddongmen

tiflow tiflow copied to clipboard

CDC initial scan are not parallel well

What did you do?

What did you expect to see?

What did you see instead?

Versions of the cluster

tiflow
tiflow copied to clipboard