flink-on-k8s-operator icon indicating copy to clipboard operation
flink-on-k8s-operator copied to clipboard

How to recover the job manager from Checkpoints

Open shravangit20 opened this issue 4 years ago • 3 comments

Hi,

How do we recover the job manager from the checkpoints instead of savepoints? Any instructions steps to follow please share.

Thanks, Shravan

shravangit20 avatar Nov 03 '20 03:11 shravangit20

Recovering from checkpoints is transparent to the operator, it is handled by Flink itself, you don't need to worry about it.

functicons avatar Nov 04 '20 19:11 functicons

@functicons I am documenting the resiliency testing by disrupting taskmanagers/job managers and would like to understand how the recovery happens. Is there a way you can help my testing? Would it be possible to connect. with you offline? Also, I have setup a 3 node zookeeper along with the operator and Flink cluster but I am having issues setup the high availability configuration to perform the disruption testing. Just need some pointers on these 2 items.

shravangit20 avatar Nov 05 '20 12:11 shravangit20

@shravangit20 I would be interested in how (if) you ultimately solutioned this. Presently I am locating the last available checkpoint and feeding it back manually in the fromSavepoint parameter manually when the job fails.

benkusak avatar Sep 21 '21 17:09 benkusak