codeflare-sdk icon indicating copy to clipboard operation
codeflare-sdk copied to clipboard

Updated Ray version to 2.20.0

Open Bobbins228 opened this issue 1 year ago • 2 comments

Issue link

RHOAIENG-6450

What changes have been made

Updated CFSDK ray dependency to 2.20.0

Verification steps

Setup

Notebook server ODH/RHOAI/Local

  • Clone this repository with git clone https://github.com/project-codeflare/codeflare-sdk.git
  • Checkout this PR's branch
  • Run poetry build - install if needed (pip install poetry)
  • Run pip install --force-reinstall dist/codeflare_sdk-0.0.0.dev0-py3-none-any.whl
  • Restart your notebook kernel

Testing

Use this image for the following test scenarios: quay.io/mcampbel/ray:220-py39-cu118-dev

  • Run through demo notebooks
  • Ensure GPU utilization is working correctly
  • Ensure basic & local interactive demos work correctly
  • Ensure job submission works correctly

Checks

  • [x] I've made sure the tests are passing.
  • Testing Strategy
    • [x] Unit tests
    • [x] Manual tests
    • [ ] Testing is not required for this change

Bobbins228 avatar May 08 '24 09:05 Bobbins228

Tested these changes through the SDK demo notebooks using quay.io/mcampbel/ray:220-py39-cu118-dev as the ray image.

  • Was able to run basic ray demo.
  • Was able to run job client demo and use gpus.
  • While running basic_interactive demo I see this info message (see screenshot below). I wasn't able to run through the entire demo due to issues with training script which I am currently looking into. But was able to run ray.init successfully.
  • Was able to run local interactive demo - including ray.init but also seeing the info about the python version mismatch image

This is not an error as such and I was able to submit jobs successfully.

Fiona-Waters avatar May 15 '24 11:05 Fiona-Waters

/retest

Bobbins228 avatar May 28 '24 12:05 Bobbins228

The e2e tests fail here due to the CertGenerator image on the CFO side when setting up the Kind Cluster. Here is proof of a passing e2e test with the new Ray image for the CertGenerator.

I have made a PR to update the CFO with the new image

Bobbins228 avatar Jun 17 '24 11:06 Bobbins228

/retest

Bobbins228 avatar Jun 19 '24 08:06 Bobbins228

/approve

Fiona-Waters avatar Jun 19 '24 11:06 Fiona-Waters

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Fiona-Waters, Srihari1192

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Jun 19 '24 11:06 openshift-ci[bot]