flink-cdc Flink job will not stop when the mysql database becomes unavailable.

Is your feature request related to a problem? Please describe. I submit a flink job that reads from a mysql database and writes to a kafka cluster by mysql cdc connector. This mysql database offline after a month, but this flink job does not exit. And the BinnaryLogClient will continue to retry endlessly.

Describe the solution you'd like I think mysql cdc connector should react to this error instead of endlessly retrying. Maybe the job should fail after retrying a few times.

Jul 14 '22 06:07 ruanhang1993

Hi @ruanhang1993,

Do you configure the restart strategy like below? The attempts option can limit the max attempts of restarts.

restart-strategy fixed-delay
restart-strategy.fixed-delay.delay: 15s
restart-strategy.fixed-delay.attempts: 10

Jul 15 '22 07:07 Jiabao-Sun

2022-07-21 19:26:04.114 INFO io.debezium.util.Threads - Creating thread debezium-mysqlconnector-mysql_binlog_source-binlog-client 2022-07-21 19:26:34.137 ERROR System.err - Jul 21, 2022 7:26:34 PM com.github.shyiko.mysql.binlog.BinaryLogClient$5 run WARNING: Failed to restore connection to ipxxxx:portxxxx. Next attempt in 60000ms

2022-07-21 19:27:34.137 ERROR System.err - Jul 21, 2022 7:27:34 PM com.github.shyiko.mysql.binlog.BinaryLogClient$5 run INFO: Trying to restore lost connection to ipxxxx:portxxxx

2022-07-21 19:27:34.137 INFO io.debezium.util.Threads - Creating thread debezium-mysqlconnector-mysql_binlog_source-binlog-client 2022-07-21 19:28:04.140 ERROR System.err - Jul 21, 2022 7:28:04 PM com.github.shyiko.mysql.binlog.BinaryLogClient$5 run WARNING: Failed to restore connection to ipxxxx:portxxxx. Next attempt in 60000ms

2022-07-21 19:28:51.247 INFO org.apache.flink.metrics.lcs.shaded.com.xiaomi.infra.galaxy.lcs.common.file.DiskManager - datadir: /home/work/app/lcs-agent/data, usableSpace: 488 GB 2022-07-21 19:29:04.140 ERROR System.err - Jul 21, 2022 7:29:04 PM com.github.shyiko.mysql.binlog.BinaryLogClient$5 run INFO: Trying to restore lost connection to ipxxxx:portxxxx

Jul 22 '22 03:07 hezhenghongmail

The task doesn't exit, it's just that debezium keeps retrying.

Jul 22 '22 03:07 hezhenghongmail

Thanks @ruanhang1993 @hezhenghongmail to report this.

Jul 22 '22 03:07 Jiabao-Sun

if i set connect.keep.alive = false ，cloud solve this problem？

Jan 30 '23 06:01 lufzhangzitao

Closing this issue because it was created before version 2.3.0 (2022-11-10). Please try the latest version of Flink CDC to see if the issue has been resolved. If the issue is still valid, kindly report it on Apache Jira under project Flink with component tag Flink CDC. Thank you!

Feb 28 '24 15:02 PatrickRen