codeflare-sdk
codeflare-sdk copied to clipboard
env parameter in DDPJobDefinition doesn't pass env variables to Ray
Describe the Bug
I want to submit Ray job with environment variables specified, however provided environment variables aren't passed into the Ray.
SDK doc specifies that DDPJobDefinition
contains property env
. I tried to pass there environment variables:
jobdef = DDPJobDefinition(
name="mnisttest",
script="mnist.py",
scheduler_args={"requirements": "requirements.txt"},
env={"PIP_INDEX_URL": "http://some-hostname/root/pypi/+simple/",
"PIP_TRUSTED_HOST": "some-hostname"}
)
job = jobdef.submit(cluster)
However submitted job didn't contain passed environment variables.
Is this a correct way of passing environment variables using SDK?
Codeflare Stack Component Versions
Please specify the component versions in which you have encountered this bug.
Codeflare SDK: 0.12.1 Ray image: quay.io/project-codeflare/ray:latest-py39-cu118
Steps to Reproduce the Bug
- Start ODH with default science notebook,
- import SDK Git repo into the Notebook
- Open 2_basic_jobs.ipynb
- Add env entry into the job definition:
jobdef = DDPJobDefinition(
name="mnisttest",
script="mnist.py",
# script="mnist_disconnected.py", # training script for disconnected environment
scheduler_args={"requirements": "requirements.txt"},
env={"PIP_INDEX_URL": "http://some-hostname/root/pypi/+simple/",
"PIP_TRUSTED_HOST": "some-hostname"}
)
job = jobdef.submit(cluster)
- Run the notebook until you submit the job
- Query Ray REST API to get submitted job definition, i.e.
curl -X GET -i 'http://<dashboard_hostname>/api/jobs/'
- Check response - env variables are missing in submitted job
What Have You Already Tried to Debug the Issue?
N/A
Expected Behavior
Submitted job contains environment variables, for example:
{
"type": "SUBMISSION",
"job_id": null,
"submission_id": "raysubmit_qtYVHfiyC7VhAPN7",
"driver_info": null,
"status": "FAILED",
"entrypoint": "python /home/ray/jobs/mnist.py",
"message": "Job entrypoint command failed with exit code 2, last available logs (truncated to 20,000 chars):\npython: can't open file '/home/ray/jobs/mnist.py': [Errno 2] No such file or directory\n",
"error_type": null,
"start_time": 1700576474095,
"end_time": 1700576476706,
"metadata": null,
"runtime_env": {
"pip": {
"packages": ["pytorch_lightning==1.5.10", "ray_lightning", "torchmetrics==0.9.1", "torchvision==0.12.0"],
"pip_check": false
},
"env_vars": {
"PIP_INDEX_URL": "http://some-hostname/root/pypi/+simple/",
"PIP_TRUSTED_HOST": "some-hostname"
}
},
"driver_agent_http_address": "http://10.129.3.14:52365",
"driver_node_id": "c3af4445c3cabfdc2291fb2fd6393da5850717eb3fd2aaeda3abe5f8"
}
Screenshots, Console Output, Logs, etc.
Affected Releases
SDK 0.12.1
Additional Context
Add as applicable and when known:
- OS: 1) MacOS, 2) Linux, 3) Windows: [1 - 3]
- OS Version: [e.g. RedHat Linux X.Y.Z, MacOS Monterey, ...]
- Browser (UI issues): 1) Chrome, 2) Safari, 3) Firefox, 4) Other (describe): [1 - 4 + description?]
- Browser Version (UI issues): [e.g. Firefix 97.0]
- Cloud: 1) AWS, 2) IBM Cloud, 3) Other (describe), or 4) on-premise: [1 - 4 + description?]
- Kubernetes: 1) OpenShift, 2) Other K8s [1 - 2 + description]
- OpenShift or K8s version: [e.g. 1.23.1]
- Other relevant info
Add any other information you think might be useful here.