flink-cdc
flink-cdc copied to clipboard
[Bug][cdc-connectors][cdc-base] Transaction log of high_watermark read twice
Search before asking
- [X] I searched in the issues and found nothing similar.
Flink version
1.18
Flink CDC version
3.0
Database and its version
anyway
Minimal reproduce step
Reason
Current, read snapshot split then backfill between [low_watermark, high_watermark], then read in stream phase between [high_watermark, +∞). The message of high_watermark will be read twice.
What did you see instead?
Anything else?
No response
Are you willing to submit a PR?
- [X] I'm willing to submit a PR!
@loserwang1024 Hi, does this issue affect mysql-cdc as well (since it involves cdc-base)? Thanks!
@link3280,just a minor optimization. To be honest, this rarely happens because high_watermark is mostly non dml record, such as heartbeat record.
@loserwang1024 Thanks a lot for your input! The reason why I ask is that I met a data duplication issue with MySQL CDC 3.0.0 when it reads binlogs with the position set to earliest-offset
or timestamp
. The data showed up twice exactly. I checked the logs and all the splits were MySqlBinlogSplit
, so the logs may be read twice. I wonder if it's the same issue.
Closing this issue as it has been migrated to Apache Jira.