kuberay icon indicating copy to clipboard operation
kuberay copied to clipboard

[Feat][RayCluster] Introduce the RayClusterStatus.Conditions field

Open rueian opened this issue 1 year ago • 5 comments
trafficstars

Why are these changes needed?

This is the first part of the process to refine the Ready/Failed status in RayClusterStatus https://docs.google.com/document/d/1bRL0cZa87eCX6SI7gqthN68CgmHaB6l3-vJuIse-BrY

In this first PR, we introduce the new RayClusterStatus.Conditions field, borrowed from the idea of k8s Deployment https://github.com/kubernetes/api/blob/857a946a225f212b64d42c68a7da0dc44837636f/apps/v1/types.go#L532-L542

Related issue number

Checks

  • [x] I've made sure the tests are passing.
  • Testing Strategy
    • [x] Unit tests
    • [x] Manual tests

rueian avatar Jul 01 '24 13:07 rueian

@rueian @kevin85421 can we have the API change and implementation in the same PR if possible? Otherwise, if we release v1.2 before the implementation we'd have to revert this PR.

andrewsykim avatar Jul 02 '24 16:07 andrewsykim

can we have the API change and implementation in the same PR if possible? Otherwise, if we release v1.2 before the implementation we'd have to revert this PR.

Do you mean adding HeadReady and RayClusterReplicaFailure only when we implement them? cc @andrewsykim

kevin85421 avatar Jul 02 '24 19:07 kevin85421

Do you mean adding HeadReady and RayClusterReplicaFailure only when we implement them?

Yes, and also the addition of the new conditions field in status. We should avoid changing the CRD / API until there is an working / tested minimum implementation. What do you think?

andrewsykim avatar Jul 02 '24 19:07 andrewsykim

@kevin85421 on second thought, I think incremental PRs is fine as long as it's feature gated. I added initial feature gate implementation in https://github.com/ray-project/kuberay/pull/2219, what do you think?

andrewsykim avatar Jul 02 '24 19:07 andrewsykim

I am ok with the feature gate or waiting for the implementation of HeadReady and RayClusterReplicaFailure.

We should avoid changing the CRD / API until there is an working / tested minimum implementation.

Hi @andrewsykim, do we still need to keep the old CRD / API if the gate is disabled? I thought that could only be achieved by introducing a new CRD version.

rueian avatar Jul 03 '24 12:07 rueian

@andrewsykim Does #2219 need to be merged before this PR or the order is flexible?

kevin85421 avatar Jul 05 '24 21:07 kevin85421

@rueian I have synchronized with @andrewsykim today. If the feature flag is off, we will still generate conditions for the CRD, but we will not write anything to conditions. This PR doesn't write anything to conditions, so this PR is independent from #2219.

kevin85421 avatar Jul 08 '24 23:07 kevin85421