tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

CDC initial scan are not parallel well

Open fubinzh opened this issue 1 year ago • 1 comments

What did you do?

  1. Deploy TiDB cluster with 2 CDC nodes. TiKV configuration cdc.incremental-fetch-speed-limit = 300MiB, dc.incremental-scan-speed-limit = 96MiB.
  2. Start cdc changefeed and pause it.
[root@tc-ticdc-0 /]# /cdc cli changefeed --server http://127.0.0.1:8301 query -c test1
{
  "upstream_id": 7301604617795285762,
  "namespace": "default",
  "id": "test1",
  "sink_uri": "blackhole:",
  "config": {
    "memory_quota": 1073741824,
    "case_sensitive": true,
    "force_replicate": false,
    "ignore_ineligible_table": false,
    "check_gc_safe_point": true,
    "filter": {
      "rules": [
        "*.*"
      ]
    },
    "mounter": {
      "worker_num": 16
    },
    "sink": {
      "protocol": "",
      "transaction_atomicity": "",
      "terminator": "\r\n",
      "delete_only_output_handle_key_columns": null
    },
    "scheduler": {
      "enable_table_across_nodes": false,
      "region_threshold": 100000,
      "write_key_threshold": 0
    },
    "integrity": {
      "integrity_check_level": "none",
      "corruption_handle_level": "warn"
    },
    "changefeed_error_stuck_duration": 1800000000000,
    "sql_mode": "ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
  },
  "create_time": "2023-11-15 18:14:51.806",
  "start_ts": 445718200320000000,
  "resolved_ts": 445768313518424347,
  "target_ts": 0,
  "checkpoint_tso": 445733430529884442,
  "checkpoint_time": "2023-11-19 04:08:18.640",
  "state": "normal",
  "creator_version": "v6.5.3",
"task_status": [
    {
      "capture_id": "11512494-c874-43ed-b111-831962ed47ca",
      "table_ids": [
        84,
        88,
        91,
        94
      ]
    },
    {
      "capture_id": "f5cf9c55-47ec-4c4c-8527-010eccf9a8e5",
      "table_ids": [
        90,
        93,
        96,
        86
      ]
    }
  ]
}

  1. Run tpcc workload for e.g. 24h to create lots of logs to be scanned by cdc
  2. Rerun cdc changefeed

What did you expect to see?

Both CDC node should have initial scan workloads at the same time.

What did you see instead?

For the first half if initial scan cdc-0 has lots of workload, while for the rest, cdc-1 has lots of workload. This resulted in a long initial scan time.

image

image

Versions of the cluster

[root@tc-ticdc-0 /]# /cdc version Release Version: v7.5.0 Git Commit Hash: 99c1f8fdffe72f2a9dbce6d0b58a52a162ce72b7 Git Branch: heads/refs/tags/v7.5.0 UTC Build Time: 2023-11-16 10:33:24 Go Version: go version go1.21.3 linux/amd64 Failpoint Build: false

fubinzh avatar Nov 20 '23 11:11 fubinzh

/severity moderate

fubinzh avatar Nov 20 '23 11:11 fubinzh

It should not be considered a bug.

asddongmen avatar May 21 '24 03:05 asddongmen