codeflare-sdk icon indicating copy to clipboard operation
codeflare-sdk copied to clipboard

Split head memory and cpu requests/limits

Open Bobbins228 opened this issue 1 year ago • 6 comments

Issue link

Closes: RHOAIENG-9259

What changes have been made

Split the head cpu and memory resources to requests/limits similar to #547 Added depreciation warnings to the old vars head_cpus and head_memory

Verification steps

Setup

Notebook server ODH/RHOAI/Local

  • Clone this repository with git clone https://github.com/project-codeflare/codeflare-sdk.git
  • Checkout this PR's branch
  • Run poetry build - install if needed (pip install poetry)
  • Run pip install --force-reinstall dist/codeflare_sdk-0.0.0.dev0-py3-none-any.whl
  • Restart your notebook kernel

Testing

Testing the depreciating args head_cpus and head_memory

Follow through the basic Ray demo. Set the head_cpus and head_memory parameters to a value of your choosing. You should get a warning that the parameters are being depreciated and to use the new ones.

The head cpu requests and limits should both equate the values you entered for the above.

Testing the new requests/limits args

In the ClusterConfiguration add the parameters

  • head_cpu_requests
  • head_cpu_limits
  • head_memory_requests
  • head_memory_limits

Set them to values of your choosing and the head pod of the Ray Cluster should reflect these values.

Checks

  • [x] I've made sure the tests are passing.
  • Testing Strategy
    • [x] Unit tests
    • [x] Manual tests
    • [ ] Testing is not required for this change

Bobbins228 avatar Jul 02 '24 09:07 Bobbins228

@ChristianZaccaria This is not expected behaviour at all :( I can have a look at adding some validation to ensure that the head/worker requests/limits are of the correct type. Good catch!

Bobbins228 avatar Jul 03 '24 09:07 Bobbins228

@Bobbins228 I couldn't get further, but I suppose maybe cluster.up() will already capture that and throw an error for using the wrong datatypes. However, you're right, there seems to be no validation when creating the yaml file.

ChristianZaccaria avatar Jul 03 '24 09:07 ChristianZaccaria

@ChristianZaccaria This is insane! It seems you can pretty much set any of the variables to whatever type you like. I will create a Jira for fixing the validation on all ClusterConfiguration parameters.

Bobbins228 avatar Jul 03 '24 10:07 Bobbins228

Applied do not merge label until RHOAIENG-9259 is a priority again.

Bobbins228 avatar Jul 09 '24 16:07 Bobbins228

/retest

Bobbins228 avatar Sep 19 '24 12:09 Bobbins228

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KPostOffice

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Sep 19 '24 13:09 openshift-ci[bot]