flink-on-k8s-operator
flink-on-k8s-operator copied to clipboard
job is executed multiple times unintentionally
Hey,
We encounter an issue in which a job is executed multiple times unintentionally although it's mentioned in the following remark that this is unexpected behavior. (https://github.com/spotify/flink-on-k8s-operator/blob/v0.4.0-beta.7/controllers/flinkcluster/flinkcluster_reconciler.go#:~:text=//%20This%20is%20an%20exceptional%20situation.)
The scenario: 1 - A logical bug in the job code was introduced to a new job version. This version was deployed to the Flink cluster, causing some of the TaskManagers to crash with an exception after a few seconds of runtime. A restart loop started happening where the job would try to re-run and crash after a few seconds, repeatedly. 2 - The bug was identified, fixed and we want to update the running job with a new fixed-job JAR.
expected: only one job should run without errors. actual: two jobs are up.
After that, when trying to cancel the unexpected job, the flink cluster is canceled as well.
Thanks, Gil
Hey @GilShmaya
- If you deploy your FlinkCluster as Job Cluster / Application Cluster, Cancelling the job from the FlinkConsole will cancel the cluster too. (That is the intended behaviour)
- Try the Session Cluster (sample) if you want to reuse your FlinkCluster for different jobs. (You'd have to submit your jobs to the JobManager using Flink CLI yourself)