kuberay
kuberay copied to clipboard
Adding serve support for Api server
Why are these changes needed?
Extending API server to support serve
Related issue number
Checks
- [x ] I've made sure the tests are passing.
- Testing Strategy
- [ x] Unit tests
- [ x] Manual tests
- [ ] This PR is not tested :(
@tedhtchang, @z103cb please take a look
cc: @kevin85421
@kevin85421 its ready for you
The CI fails consistently. Would you mind taking a look? Thanks!
This test sometimes PASSED locally.
pytest --log-level=info -v -s tests/test_sample_rayservice_yamls.py::TestRayService::test_zero_downtime_rollout
Sometimes failed with
Failed to connect to rayservice-sample-serve-svc.default.svc.cluster.local port 8000: Connection refused
command terminated with exit code 7
....
INFO framework.utils:utils.py:375 Execute command (check_output): kubectl exec curl -n default -- curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc.default.svc.cluster.local:8000/fruit/ -d '["MANGO", 2]'
INFO framework.utils:utils.py:382 Exception: b''
====================================================
or
....
INFO test_sample_rayservice_yamls:test_sample_rayservice_yamls.py:149 Ray service transitioned to status Running.
INFO test_sample_rayservice_yamls:test_sample_rayservice_yamls.py:154 Ray service has moved to cluster "rayservice-sample-raycluster-xtvwr"
INFO framework.utils:utils.py:375 Execute command (check_output): kubectl exec curl -n default -- curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc.default.svc.cluster.local:8000/fruit/ -d '["MANGO", 2]'
INFO framework.utils:utils.py:379 Output: b'6'
============================================================================== short test summary info ==============================================================================
FAILED tests/test_sample_rayservice_yamls.py::TestRayService::test_zero_downtime_rollout - AssertionError
ANd I do not think this PR changes anything there
Hmm, I'm not referring to the test-rayservice-sample-yamls-latest-release. More than one CI check has failed. You may need to rebase the branch or open a new PR instead.
@kevin85421 we are all set
Sorry @kevin85421 , I completely disagree with your statements, see point-by-point replies below
Perhaps the KubeRay API server should construct its own HTTP client instead of using the existing one in KubeRay.
I do not think it makes sense to maintain 2 copies of identical code
The dashboard HTTP client in KubeRay is not a user-facing API, so no one promises its backward compatibility. Constructing a new one in the KubeRay API server can improve the stability.
It is not user-facing in the API server either. API Server uses these APIs to communicate with the Ray cluster and then convert them to the internal representation defined by its own APIs
Reducing the dependencies between KubeRay API server and KubeRay can avoid a lot of issues for users and developers. For example, the KubeRay API server may want to support multiple versions of KubeRay and Ray. It is hard to achieve if there are a lot of dependencies between them.
Theoretically, it might be right, but practically will lead to a huge duplication of the code. The API server does depend on the KubeRay operator
The fact that these functions are not used by KubeRay means they are not tested by end-to-end tests and not utilized by users. Therefore, it is difficult to ascertain whether they work or not.
That's why I was testing them. Besides, they are used by KubeRay operator
Using the dashboard HTTP client in KubeRay for communication with Ray doesn't seem to make much sense to me. The KubeRay API server can directly talk to Ray.
It does talk to Ray directly using the dashboard HTTP client as a code to do this
@kevin85421, so where do we go from here. We disagree, should we ask for the third opinion?
@kevin85421 we need to come to some kind of resolution here. Leaving it hanging is not a solution @architkulkarni ? Also, Ray job support is using the same class
@kevin85421 it has been over a month. Any chance you can get to this?
I will remove all functions in dashboard_httpclient.go that are not used by the KubeRay operator itself before the release of KubeRay v1.1.0. I don't want to have any unnecessary dependencies between KubeRay and KubeRay API server. In addition, I may also consider to move the API server to a separate repository.
We have 3 possible options to clean up dashboard_httpclient.go
- I revert #1639.
- I open a PR for a dashboard client for API server.
- You can open a PR for a dashboard client for API server.
I will remove all functions in dashboard_httpclient.go that are not used by the KubeRay operator itself before the release of KubeRay v1.1.0. I don't want to have any unnecessary dependencies between KubeRay and KubeRay API server. In addition, I may also consider to move the API server to a separate repository.
There are 2 methods added there:
- DeployApplication(context.Context, string) error which was copied from driver, thus making it simpler and cleaner trivial 10 lines of code
- DeleteServeApplications(context.Context) error, which is 15 trivial lines of code
The biggest change is updating the return definition to align it with the definition of the serve implementation
Do whatever you have to do. This is taking way too long
@kevin85421 moved dashboard_httpclient.go API server specific version and supporting files to the API server. Can we, please, finally merge it
@kevin85421 the only change to the ray operator is in pod.go to capitalize initLivenessAndReadinessProbe to make it visible