kuberay icon indicating copy to clipboard operation
kuberay copied to clipboard

[Feature] [api server] Add support for appwrappers

Open z103cb opened this issue 2 years ago • 2 comments

Added support for submitting appwrappers using the Codeflare Operator

Why are these changes needed?

Adding support appwrappers allows codeflare api to create ray clusters and jobs .

Related issue number

Closes #1454

Checks

  • [X] I've made sure the tests are passing.
  • Testing Strategy
    • [X ] Unit tests
    • [X] Manual tests
    • [ ] This PR is not tested :(

z103cb avatar Nov 23 '23 09:11 z103cb

here are my main concerns:

  1. We agreed that it's not either MCAD or no MCAD. It is a flag on a specific request whether to use or not include MCAD for a specific cluster
  2. For cluster creation, MCAD usage is clear
  3. For RayJob it's more complex. It can either create a new cluster, in which case MCAD is relevant, while in the case of cluster reference, MCAD is probably not relevant. The last complication is that RayJob is using cluster + job submitter job. Is the job submitter part of MCAD resource management?
  4. For serve, it is even more complex. The operator not only creates a cluster, it can also create another cluster in case of any changes.

So we do need to decide in which cases MCAD can/should be used.

blublinsky avatar Nov 25 '23 19:11 blublinsky

here are my main concerns:

  1. We agreed that it's not either MCAD or no MCAD. It is a flag on a specific request whether to use or not include MCAD for a specific cluster
  2. For cluster creation, MCAD usage is clear
  3. For RayJob it's more complex. It can either create a new cluster, in which case MCAD is relevant, while in the case of cluster reference, MCAD is probably not relevant. The last complication is that RayJob is using cluster + job submitter job. Is the job submitter part of MCAD resource management?
  4. For serve, it is even more complex. The operator not only creates a cluster, it can also create another cluster in case of any changes.

So we do need to decide in which cases MCAD can/should be used.

I think the requirements were:

  1. Make the apiserver integration "plugable" as agreed with Anish. I belive the approach taken in this PR achive that.
  2. For RayJobs the approach take for clusters will still apply. There are two flavors for job creation:
  • Job + Cluster: they will both be "wrapped" with an appwrapper and dispatched when resources are available.
  • Job + Cluster selector: the job will be "wrapped" and dispatched -- to keep things simple. I suppose that you can make things more complicated and see if the cluster is actually running and not "wrapp" the job CRD, but untill we get any requirements to that end. I vote for the judicious application of the KIS principle.
  1. I don't plan on adding MCAD support for the "Serve" endpoints. Queueing makes no sense for these usecases covered by the KubeRay Serve.

z103cb avatar Nov 27 '23 08:11 z103cb

closing the PR as the work is no longer needed.

z103cb avatar Apr 29 '24 07:04 z103cb