kuberay icon indicating copy to clipboard operation
kuberay copied to clipboard

[RayCluster][Fix] evicted head-pod can be recreated or restarted

Open JasonChen86899 opened this issue 1 year ago • 5 comments
trafficstars

Why are these changes needed?

This PR attempts to fix issues https://github.com/ray-project/kuberay/issues/2125 if head pod has been evicted, we will delete it and let it restart or recreate

Related issue number

https://github.com/ray-project/kuberay/issues/2125

Checks

  • [ ] I've made sure the tests are passing.
  • Testing Strategy
    • [x] Unit tests
    • [ ] Manual tests
    • [ ] This PR is not tested :(

JasonChen86899 avatar Jul 02 '24 15:07 JasonChen86899

Hey @JasonChen86899, I didn't know that you wanted to work on the issue. I have already assigned the issue to @MortalHappiness before you open this PR. Maybe we can find other issues to collaborate on if you are interested in contributing to KubeRay? Sorry for the inconvenience.

kevin85421 avatar Jul 02 '24 15:07 kevin85421

@kevin85421 I am OK that if @JasonChen86899 wants to create a PR.

MortalHappiness avatar Jul 02 '24 16:07 MortalHappiness

@MortalHappiness Thanks!

kevin85421 avatar Jul 02 '24 16:07 kevin85421

Hey @JasonChen86899, I didn't know that you wanted to work on the issue. I have already assigned the issue to @MortalHappiness before you open this PR. Maybe we can find other issues to collaborate on if you are interested in contributing to KubeRay? Sorry for the inconvenience.

@kevin85421 Sorry, I just made a draft and didn't notice that it was assigned, I have closed it. cc @MortalHappiness

JasonChen86899 avatar Jul 02 '24 16:07 JasonChen86899

@JasonChen86899 No worries. I have already synced with @MortalHappiness. He is comfortable with your PR. We can review this PR together. You don't need to close it.

kevin85421 avatar Jul 02 '24 16:07 kevin85421

Is this PR ready for review? It is still marked as a draft.

kevin85421 avatar Jul 10 '24 05:07 kevin85421

Is this PR ready for review? It is still marked as a draft.

marked ready for review

JasonChen86899 avatar Jul 10 '24 09:07 JasonChen86899

Based on this logic, I update the logic of the shoulDelete function https://github.com/kubernetes/kubernetes/blob/46aa8959a0659e22c924bb52b38385d441715b2b/pkg/kubelet/kubelet_pods.go#L1556

  1. Terminated state (end or successful), delete it
  2. Running state,If the restart policy is Never, delete it
  3. The rest are not deleted

@kevin85421 @andrewsykim

JasonChen86899 avatar Jul 15 '24 11:07 JasonChen86899

I will revisit this PR this week. I hope to include this PR in v1.2.0. I will cut the branch next week.

kevin85421 avatar Jul 24 '24 06:07 kevin85421

cc @MortalHappiness would you mind reviewing this PR?

kevin85421 avatar Aug 06 '24 04:08 kevin85421