kuberay
kuberay copied to clipboard
[RayCluster][Fix] Add expectations of RayCluster
Why are these changes needed?
This PR attempts to address issues #715 and #1936 by adding expectation capabilities to ensure the pod is in the desired state during the next Reconcile following pod deletion/creation.
Similar solutions can be referred to at:
- https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go#L146
- https://github.com/openkruise/kruise/tree/master/pkg/util/expectations
Related issue number
- #715
- #1936
Checks
- [ ] I've made sure the tests are passing.
- Testing Strategy
- [x] Unit tests
- [ ] Manual tests
- [ ] This PR is not tested :(
Hi @Eikykun, thank you for the PR! I will review it next week. Are you on Ray Slack? We can iterate more quickly there since this is a large PR. My Slack handle is "Kai-Hsun Chen (ray team)". Thanks!
I will review this PR tomorrow.
cc @rueian Would you mind giving this PR a review? I think I don't have enough time to review it today. Thanks!
Just wondering if the client-go's workqueue ensures that no more than one consumer can process an equivalent reconcile.Request at any given time, why don't we clear the related informer cache when needed?
Just wondering if the client-go's
workqueueensures that no more than one consumer can process an equivalentreconcile.Requestat any given time, why don't we clear the related informer cache when needed?
Apologies, I'm not quite clear about what "related informer cache" refers to.
Just wondering if the client-go's
workqueueensures that no more than one consumer can process an equivalentreconcile.Requestat any given time, why don't we clear the related informer cache when needed?Apologies, I'm not quite clear about what "related informer cache" refers to.
According to https://github.com/ray-project/kuberay/issues/715, the root cause is the stale informer cache, so I am wondering if the issue can be solved by fixing the cache, for example doing a manual Resync somehow.
I am reviewing this PR now. I will try to review this PR an iteration every 1 or 2 days.
Btw, @Eikykun would you mind rebasing with the master branch and resolving the conflict? Thanks!
According to #715, the root cause is the stale informer cache, so I am wondering if the issue can be solved by fixing the cache, for example doing a manual
Resyncsomehow.
Gotit. From a problem-solving standpoint, if we don't rely on an informer in the controller and directly query the ApiServer for pods, the cache consistency issue with etcd wouldn't occur. However, this approach would increase network traffic and affect reconciliation efficiency.
As far as I understand, the Resync() method in DeltaFIFO is not intended to ensure cache consistency with etcd, but rather to prevent event loss by means of periodic reconciliation.
Btw, @Eikykun would you mind rebasing with the master branch and resolving the conflict? Thanks!
thanks for your review, I will review the pr issue and resolve the conflicts later.
@Eikykun would you mind installing pre-commit https://github.com/ray-project/kuberay/blob/master/ray-operator/DEVELOPMENT.md and fixing the linter issues? Thanks!
At a quick glance, it seems that we create an ActiveExpectationItem for each Pod's creation, deletion, or update. I have some concerns about the scalability bottleneck caused by the memory usage. In ReplicaSet's source code, it seems only track the number of Pods expect to be created or deleted per ReplicaSet.
Follow up for ^
At a quick glance, it seems that we create an ActiveExpectationItem for each Pod's creation, deletion, or update. I have some concerns about the scalability bottleneck caused by the memory usage. In ReplicaSet's source code, it seems only track the number of Pods expect to be created or deleted per ReplicaSet.
Sorry, I didn't have time to reply a few days ago.
ActiveExpectationItem is removed after fulfilling its expectations. Therefore, the memory usage depends on how many pods that are being created or deleted are not yet synchronized to the cache. It might not actually consume much memory? Also, ControllerExpectations caches each pod's UID: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go#L364
Therefore, I'm not quite sure which one is lighter, ActiveExpectationItem or ControllerExpectations.
I started with ControllerExpectations in RayCluster from the beginning. But I'm a bit unsure why I switched to ActiveExpectationItem; perhaps it was more complicated. ControllerExpectations requires using PodEventHandler to handle Observed logic. RayCluster needs to implement PodEventHandler logic separately.
Could you help approve the workflow? cc @kevin85421 If there are any questions, concerns, or additional changes needed on my part, please let me know. I am happy to assist in any way possible to expedite the process :)
@Eikykun, thank you for following up! Sorry for the late review. I had concerns about merging such a large change before Ray Summit. Now, I have enough time to verify the correctness and stability of this PR. This is also one of the most important stability improvements in the post-summit roadmap.
https://docs.google.com/document/d/1YYuAQkHKz2UTFMnTDJLg4qnW2OAlYQqjvetP_nvt0nI/edit?tab=t.0
I will resume reviewing this PR this week.
cc @MortalHappiness can you also give this PR a pass of review?
A few questions I'd like to ask:
- Is
ExpectedResourceType = "RayCluster"actually used? So far, I've only seen "Pod" being used. - Your implementation and the code in the description differ quite a bit, so I'm curious:
- If only "Pod" is being used, why separate it into
ActiveExpectationandRayClusterExpectation? - Why have implementations for both, with
RayClusterExpectationcomposingActiveExpectation? From what I can see, it seems like there only needs to be one. It feels more similar to ScaleExpectations, which mainly expects two operations: "Create" and "Delete."
- If only "Pod" is being used, why separate it into
A few questions I'd like to ask:
Is
ExpectedResourceType = "RayCluster"actually used? So far, I've only seen "Pod" being used.Your implementation and the code in the description differ quite a bit, so I'm curious:
- If only "Pod" is being used, why separate it into
ActiveExpectationandRayClusterExpectation?- Why have implementations for both, with
RayClusterExpectationcomposingActiveExpectation? From what I can see, it seems like there only needs to be one. It feels more similar to ScaleExpectations, which mainly expects two operations: "Create" and "Delete."
This might be an issue that was left over after the last simplification. Initially, I added many types like RayCluster, Service, etc., considering that more than the scale pod might require expectations. If we only consider the scaling logic for each group, we can significantly simplify the code. In fact, I recently streamlined the code and reduced the scaling expectation code to around 100 lines. You can find it in the latest commit.
By the way, maybe you missed this because this comment is folded. https://github.com/ray-project/kuberay/pull/2150#discussion_r1843097987
Could you either:
- Change the
ExpectScalePodto return an error, just like how kubernetes does. - Add some comments to say that errors are ignored because it panics instead of returning error.
By the way, maybe you missed this because this comment is folded. #2150 (comment)
Could you either:
- Change the
ExpectScalePodto return an error, just like how kubernetes does.- Add some comments to say that errors are ignored because it panics instead of returning error.
Thank you for your patient review. I have added some comments.
@kevin85421 Can you merge the PR?
Merged. Thank you for the contribution!