[mysql-cdc] Fix the hung up of snapshot phase when reuse binaryLogClient
Because callback( eventListeners and lifecycleListeners) of BinaryLogClient is a list, and BinaryLogClient may reuse (see MySqlSplitReader#checkSplitOrStartNext),when multiple snapshotSplits are submitted to a SnapshotSplitReader, the callback list contains already processed snapshotSplits's MySqlBinlogSplitReadTask#handleEvent。When a binlog event arrives, the processed snapshot's callbacks are called and causes the current snapshot's BackfillBinlogReadTask's execute function end before get the BINLOG_END watermark event. So the snapshot phase hangs.
The following is the log of our online environment, we can see muliple MySqlStreamingChangeEventSource(super calss of MySqlBinlogSplitReadTask) callbacks of different snapshotSplits.
io.debezium.connector.mysql.MySqlStreamingChangeEventSource - XXX: eventListeners(7): com.github.shyiko.mysql.binlog.jmx.BinaryLogClientStatistics@61540cca,com.github.shyiko.mysql.binlog.jmx.BinaryLogClientStatistics@352b5758,io.debezium.connector.mysql.MySqlStreamingChangeEventSource$$Lambda$1014/1247290871@703f0cf,io.debezium.connector.mysql.MySqlStreamingChangeEventSource$$Lambda$1015/190751860@5a253136,io.debezium.connector.mysql.MySqlStreamingChangeEventSource$$Lambda$1016/10641269@12fef255,com.github.shyiko.mysql.binlog.jmx.BinaryLogClientStatistics@18c84a61,com.github.shyiko.mysql.binlog.jmx.BinaryLogClientStatistics@55443f, lifecycleListeners(5): com.github.shyiko.mysql.binlog.jmx.BinaryLogClientStatistics@61540cca,com.github.shyiko.mysql.binlog.jmx.BinaryLogClientStatistics@352b5758,io.debezium.connector.mysql.MySqlStreamingChangeEventSource$ReaderThreadLifecycleListener@730a6982,com.github.shyiko.mysql.binlog.jmx.BinaryLogClientStatistics@18c84a61,com.github.shyiko.mysql.binlog.jmx.BinaryLogClientStatistics@55443f
We believe, the imporper use of mysql BinlogClient is the root cause of some task hung up issues, such as #1156。
@leonardBang @kylemeow @minchowang Would you help to look at this problem.
We just encountered this problem online. The snapshot stage is stuck, and the problem is solved after this repair. @minchowang
Thanks @lzshlzsh for the detail report and fix! I'll review this PR asap
Hi @lzshlzsh, thanks for your contribution! Before this PR could be merged, could you please rebase it with latest master branch?
cc @leonardBang @PatrickRen
This pull request has been automatically marked as stale because it has not had recent activity for 60 days. It will be closed in 30 days if no further activity occurs.