flink-on-k8s-operator icon indicating copy to clipboard operation
flink-on-k8s-operator copied to clipboard

job is executed multiple times unintentionally

Open GilShmaya opened this issue 2 years ago • 1 comments

Hey,

We encounter an issue in which a job is executed multiple times unintentionally although it's mentioned in the following remark that this is unexpected behavior. (https://github.com/spotify/flink-on-k8s-operator/blob/v0.4.0-beta.7/controllers/flinkcluster/flinkcluster_reconciler.go#:~:text=//%20This%20is%20an%20exceptional%20situation.)

The scenario: 1 - A logical bug in the job code was introduced to a new job version. This version was deployed to the Flink cluster, causing some of the TaskManagers to crash with an exception after a few seconds of runtime. A restart loop started happening where the job would try to re-run and crash after a few seconds, repeatedly. 2 - The bug was identified, fixed and we want to update the running job with a new fixed-job JAR.

expected: only one job should run without errors. actual: two jobs are up.

After that, when trying to cancel the unexpected job, the flink cluster is canceled as well.

Thanks, Gil

GilShmaya avatar Apr 11 '22 10:04 GilShmaya

Hey @GilShmaya

  1. If you deploy your FlinkCluster as Job Cluster / Application Cluster, Cancelling the job from the FlinkConsole will cancel the cluster too. (That is the intended behaviour)
  2. Try the Session Cluster (sample) if you want to reuse your FlinkCluster for different jobs. (You'd have to submit your jobs to the JobManager using Flink CLI yourself)

live-wire avatar Apr 25 '22 12:04 live-wire