flink-cdc icon indicating copy to clipboard operation
flink-cdc copied to clipboard

Flink job will not stop when the mysql database becomes unavailable.

Open ruanhang1993 opened this issue 3 years ago • 5 comments

Is your feature request related to a problem? Please describe. I submit a flink job that reads from a mysql database and writes to a kafka cluster by mysql cdc connector. This mysql database offline after a month, but this flink job does not exit. And the BinnaryLogClient will continue to retry endlessly.

Describe the solution you'd like I think mysql cdc connector should react to this error instead of endlessly retrying. Maybe the job should fail after retrying a few times.

ruanhang1993 avatar Jul 14 '22 06:07 ruanhang1993

Hi @ruanhang1993,

Do you configure the restart strategy like below? The attempts option can limit the max attempts of restarts.

restart-strategy fixed-delay
restart-strategy.fixed-delay.delay: 15s
restart-strategy.fixed-delay.attempts: 10

Jiabao-Sun avatar Jul 15 '22 07:07 Jiabao-Sun

2022-07-21 19:26:04.114 INFO io.debezium.util.Threads - Creating thread debezium-mysqlconnector-mysql_binlog_source-binlog-client 2022-07-21 19:26:34.137 ERROR System.err - Jul 21, 2022 7:26:34 PM com.github.shyiko.mysql.binlog.BinaryLogClient$5 run WARNING: Failed to restore connection to ipxxxx:portxxxx. Next attempt in 60000ms

2022-07-21 19:27:34.137 ERROR System.err - Jul 21, 2022 7:27:34 PM com.github.shyiko.mysql.binlog.BinaryLogClient$5 run INFO: Trying to restore lost connection to ipxxxx:portxxxx

2022-07-21 19:27:34.137 INFO io.debezium.util.Threads - Creating thread debezium-mysqlconnector-mysql_binlog_source-binlog-client 2022-07-21 19:28:04.140 ERROR System.err - Jul 21, 2022 7:28:04 PM com.github.shyiko.mysql.binlog.BinaryLogClient$5 run WARNING: Failed to restore connection to ipxxxx:portxxxx. Next attempt in 60000ms

2022-07-21 19:28:51.247 INFO org.apache.flink.metrics.lcs.shaded.com.xiaomi.infra.galaxy.lcs.common.file.DiskManager - datadir: /home/work/app/lcs-agent/data, usableSpace: 488 GB 2022-07-21 19:29:04.140 ERROR System.err - Jul 21, 2022 7:29:04 PM com.github.shyiko.mysql.binlog.BinaryLogClient$5 run INFO: Trying to restore lost connection to ipxxxx:portxxxx

hezhenghongmail avatar Jul 22 '22 03:07 hezhenghongmail

The task doesn't exit, it's just that debezium keeps retrying.

hezhenghongmail avatar Jul 22 '22 03:07 hezhenghongmail

image

hezhenghongmail avatar Jul 22 '22 03:07 hezhenghongmail

Thanks @ruanhang1993 @hezhenghongmail to report this.

Jiabao-Sun avatar Jul 22 '22 03:07 Jiabao-Sun

image

if i set connect.keep.alive = false ,cloud solve this problem?

lufzhangzitao avatar Jan 30 '23 06:01 lufzhangzitao

Closing this issue because it was created before version 2.3.0 (2022-11-10). Please try the latest version of Flink CDC to see if the issue has been resolved. If the issue is still valid, kindly report it on Apache Jira under project Flink with component tag Flink CDC. Thank you!

PatrickRen avatar Feb 28 '24 15:02 PatrickRen