flink-on-k8s-operator How to start job after jobmanager fails for whatever reason?

How to start job after jobmanager fails for whatever reason?

Open frenkdefrog opened this issue 3 years ago • 1 comments

Hi folks, I am still getting familiar with Flink-operator, and I would like to ask for your help with the following question. After starting a new Flink Job Cluster a new pod comes up, which submits the job for the jobmanager. After a while, it goes into the completed state, and the job keeps running. In my use case there is no persistent volume in my cluster, there is no need to set up any savepoints. All that I would like to achieve is to make sure that the job will be started again whenever the jobmanager fails. I don't need to restore anything, just run the job again. Is there any possibility for this other than set up persistent volume and savepointsdir with autosavepoints?

May 06 '21 13:05 frenkdefrog

set HA properties for JobManager in flink.conf https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/ and it should restart JobManager state from the remote storage

May 22 '21 16:05 stolendog

flink-on-k8s-operator flink-on-k8s-operator copied to clipboard

How to start job after jobmanager fails for whatever reason?

flink-on-k8s-operator
flink-on-k8s-operator copied to clipboard