codeflare-sdk
codeflare-sdk copied to clipboard
Guided Demo ClusterConfig Image is not optional
Describe the Bug
In the Basic Ray Demo:
https://github.com/project-codeflare/codeflare-sdk/blob/main/demo-notebooks/guided-demos/0_basic_ray.ipynb
The ClusterConfigation includes a commented out parameter for image with a note saying it is optional. With the parameter commented out it produces the following error:
ValueError: Image must be specified in the ClusterConfiguration
If you revert the file to an older version it has the following:
# Create and configure our cluster object
# The SDK will try to find the name of your default local queue based on the annotation "kueue.x-k8s.io/default-queue": "true" unless you specify the local queue manually below
cluster = Cluster(ClusterConfiguration(
name='raytest',
head_cpus='500m',
head_memory=2,
head_gpus=0, # For GPU enabled workloads set the head_gpus and num_gpus
num_gpus=0,
num_workers=2,
min_cpus='250m',
max_cpus=1,
min_memory=4,
max_memory=4,
image="quay.io/rhoai/ray:2.23.0-py39-cu121",
write_to_file=False, # When enabled Ray Cluster yaml files are written to /HOME/.codeflare/resources
# local_queue="local-queue-name" # Specify the local queue manually
))
Which works correctly without an error.
Codeflare Stack Component Versions
Please specify the component versions in which you have encountered this bug.
Codeflare SDK: 0.16.4 MCAD: Instascale: Codeflare Operator: Other:
Openshift AI 2.11
Steps to Reproduce the Bug
- Clone repo
- Update token/api url
- Attempt to execute code
What Have You Already Tried to Debug the Issue?
Adding the image resolves the issue.
Expected Behavior
It does not appear that this field is optional or if it is optional in newer versions, that should be noted in the example code.
Screenshots, Console Output, Logs, etc.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[6], line 3
1 # Create and configure our cluster object
2 # The SDK will try to find the name of your default local queue based on the annotation "kueue.x-k8s.io/default-queue": "true" unless you specify the local queue manually below
----> 3 cluster = Cluster(ClusterConfiguration(
4 name='raytest',
5 head_cpus='500m',
6 head_memory=2,
7 head_gpus=0, # For GPU enabled workloads set the head_gpus and num_gpus
8 num_gpus=0,
9 num_workers=2,
10 min_cpus='250m',
11 max_cpus=1,
12 min_memory=4,
13 max_memory=4,
14 # image="quay.io/rhoai/ray:2.23.0-py39-cu121",
15 write_to_file=False, # When enabled Ray Cluster yaml files are written to /HOME/.codeflare/resources
16 # local_queue="local-queue-name" # Specify the local queue manually
17 ))
File /opt/app-root/lib64/python3.9/site-packages/codeflare_sdk/cluster/cluster.py:70, in Cluster.__init__(self, config)
63 """
64 Create the resource cluster object by passing in a ClusterConfiguration
65 (defined in the config sub-module). An AppWrapper will then be generated
66 based off of the configured resources to represent the desired cluster
67 request.
68 """
69 self.config = config
---> 70 self.app_wrapper_yaml = self.create_app_wrapper()
71 self._job_submission_client = None
72 self.app_wrapper_name = self.config.name
File /opt/app-root/lib64/python3.9/site-packages/codeflare_sdk/cluster/cluster.py:132, in Cluster.create_app_wrapper(self)
127 raise TypeError(
128 f"Namespace {self.config.namespace} is of type {type(self.config.namespace)}. Check your Kubernetes Authentication."
129 )
131 # Validate image configuration
--> 132 self.validate_image_config()
134 # Before attempting to create the cluster AW, let's evaluate the ClusterConfig
136 name = self.config.name
File /opt/app-root/lib64/python3.9/site-packages/codeflare_sdk/cluster/cluster.py:114, in Cluster.validate_image_config(self)
107 """
108 Validates that the image configuration is not empty.
109
110 :param image: The image string to validate
111 :raises ValueError: If the image is not specified
112 """
113 if self.config.image == "" or self.config.image == None:
--> 114 raise ValueError("Image must be specified in the ClusterConfiguration")
ValueError: Image must be specified in the ClusterConfiguration
Affected Releases
Issue appears to have been introduced in this commit:
https://github.com/project-codeflare/codeflare-sdk/commit/5262e26aa4c828341c4df91e34bec9fc7b51fb44
Additional Context
Add as applicable and when known:
- OS: 1) MacOS, 2) Linux, 3) Windows: [1 - 3]
- OS Version: [e.g. RedHat Linux X.Y.Z, MacOS Monterey, ...]
- Browser (UI issues): 1) Chrome, 2) Safari, 3) Firefox, 4) Other (describe): [1 - 4 + description?]
- Browser Version (UI issues): [e.g. Firefix 97.0]
- Cloud: 1) AWS, 2) IBM Cloud, 3) Other (describe), or 4) on-premise: [1 - 4 + description?]
- Kubernetes: 1) OpenShift, 2) Other K8s [1 - 2 + description]
- OpenShift or K8s version: [e.g. 1.23.1]
- Other relevant info
Add any other information you think might be useful here.