tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

the duration reached 400ms from cdc to kafka when inject network duration 20ms between cdc and kafka, the impact is bigger than mysql sink

Open Lily2025 opened this issue 11 months ago • 4 comments

What did you do?

1、create changefeed with simple protocol 2、run sysbench 3、inject network duration 20ms between cdc and kafka

What did you expect to see?

No response

What did you see instead?

the duration reached 400ms from cdc to kafka when inject network duration 20ms between cdc and kafka, and the changefeed lag increase continually

the impact is bigger than mysql sink because kafka sink only has one worker working

Versions of the cluster

cdc git-hash:8c51dfa5c08543395e564ae526478990105e259b

current status of DM cluster (execute query-status <task-name> in dmctl)

No response

Lily2025 avatar Mar 14 '24 01:03 Lily2025

/type enhancement /assign asddongmen

Lily2025 avatar Mar 14 '24 01:03 Lily2025

/remove-area dm /area ticdc

Lily2025 avatar Mar 14 '24 01:03 Lily2025

By design. The root cause is that kafka sink can only use 1 worker. We will improve it in long term.

flowbehappy avatar Apr 08 '24 09:04 flowbehappy

To ensure all data from a table are sent sequentially downstream to Kafka, the KafkaSink uses only one worker to produce messages. This means that only one TCP connection is established between CDC and the Kafka server. Therefore, in case of high network latency, the throughput of KafkaSink can significantly deteriorate. We are planning to refactor KafkaSink to use more workers for message production, which should help mitigate this issue. cc @flowbehappy @fubinzh

asddongmen avatar Apr 30 '24 03:04 asddongmen