flink-cdc icon indicating copy to clipboard operation
flink-cdc copied to clipboard

[Bug][cdc-connectors][cdc-base] Transaction log of high_watermark read twice

Open loserwang1024 opened this issue 1 year ago • 2 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Flink version

1.18

Flink CDC version

3.0

Database and its version

anyway

Minimal reproduce step

Reason

Current, read snapshot split then backfill between [low_watermark, high_watermark], then read in stream phase between [high_watermark, +∞). The message of high_watermark will be read twice.

What did you see instead?

Anything else?

No response

Are you willing to submit a PR?

  • [X] I'm willing to submit a PR!

loserwang1024 avatar Dec 18 '23 11:12 loserwang1024

@loserwang1024 Hi, does this issue affect mysql-cdc as well (since it involves cdc-base)? Thanks!

link3280 avatar Feb 27 '24 10:02 link3280

@link3280,just a minor optimization. To be honest, this rarely happens because high_watermark is mostly non dml record, such as heartbeat record.

loserwang1024 avatar Feb 28 '24 02:02 loserwang1024

@loserwang1024 Thanks a lot for your input! The reason why I ask is that I met a data duplication issue with MySQL CDC 3.0.0 when it reads binlogs with the position set to earliest-offset or timestamp. The data showed up twice exactly. I checked the logs and all the splits were MySqlBinlogSplit, so the logs may be read twice. I wonder if it's the same issue.

link3280 avatar Feb 28 '24 03:02 link3280

Closing this issue as it has been migrated to Apache Jira.

PatrickRen avatar Apr 09 '24 06:04 PatrickRen