kuberay icon indicating copy to clipboard operation
kuberay copied to clipboard

[Bug] Flakes of inconsistency between RayService Object's nested RayCluster state and RayCluster Object state

Open weizhaowz opened this issue 10 months ago • 4 comments

Search before asking

  • [x] I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

There are some flakes in e2e tests. After the creation of a ray cluster, there might be some inconsistencies between the RayService Object's nested RayCluster state and the RayCluster Object state, in details, the RayService.status.activeServiceStatus.rayClusterStatus.state is unset, but RayCluster.status.state is ready, and the application's status is ready as well. While the the state field is replaced with Conditions, should we switch to Conditions in the e2e tests as well?

Reproduction script

This bug is a flakiness issue.

  1. create a RayService in an e2e test;
  2. query the values of ayService.status.activeServiceStatus.rayClusterStatus.state and RayCluster.status.state

Anything else

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

weizhaowz avatar Jan 21 '25 22:01 weizhaowz

@kevin85421 @rueian do you know if this bug overlaps with any of the RayService refactorign we're doing now?

andrewsykim avatar Jan 22 '25 01:01 andrewsykim

I believe we don't cover this issue in the current RayService refactoring because the refactoring is focusing on the RayService reconciliation while the only way to solve the inconsistency is to update RayService.status.activeServiceStatus.rayClusterStatus from the RayCluster reconciliation.

rueian avatar Jan 22 '25 23:01 rueian

Do we switch to the field of Conditions or continue to us the field of state for a while?

weizhaowz avatar Jan 23 '25 18:01 weizhaowz

Hi, @weizhaowz do you mind point out which flaky test? thank you

Future-Outlier avatar Oct 25 '25 01:10 Future-Outlier