skypilot
skypilot copied to clipboard
User Ray and Sky Ray version conflicts
Justin’s requirements.txt will install ray. If the version of ray does not equal our ray version, it will cause some problems. Maybe we should encourage the user to use conda environment?
It looks like for Justin, the version of Ray doesn't matter.
Running into this for Balsa.
(sky-cmd pid=11365) RuntimeError: Version mismatch: The cluster was started with:
(sky-cmd pid=11365) Ray: 1.10.0
(sky-cmd pid=11365) Python: 3.9.4
(sky-cmd pid=11365) This process on node 172.31.65.254 was started with:
(sky-cmd pid=11365) Ray: 1.9.2
(sky-cmd pid=11365) Python: 3.7.12
The former is Sky's remote Ray version (controlled) + Python version (uncontrolled by us, depends on the AMI!). The latter is this task's activated conda environment.
For some reason, I didn't see this error on this same cluster before today.
~Update: To get around the above, I had to manually make the Sky task's conda env use the cluster's Ray/python versions.~
Scratch that. This remains a problem. After I made the Sky task's conda env use Python 3.9.4, installing requirements.txt
failed due to an old dep torch==1.4.0
not being supportd.
Update: manually got the task running by installing Ray + Sky inside the Sky task conda env.
This error
RuntimeError: Version mismatch: The cluster was started with:
Ray: 1.10.0
Python: 3.9.4
This process on node 192.168.15.204 was started with:
Ray: 1.10.0
Python: 3.9.12
is run into again by @pounde. After a sky.exec()
that runs sudo conda install --file <file>
, the system python was changed from 3.9.4 (which was used to launch Sky/Ray runtime) to 3.9.12.
I can confirm that installing the packages into a new environment nagates the problem ie,
conda create -n