skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

User Ray and Sky Ray version conflicts

Open gmittal opened this issue 2 years ago • 5 comments

Justin’s requirements.txt will install ray. If the version of ray does not equal our ray version, it will cause some problems. Maybe we should encourage the user to use conda environment?

gmittal avatar Mar 30 '22 07:03 gmittal

It looks like for Justin, the version of Ray doesn't matter.

michaelzhiluo avatar Mar 30 '22 23:03 michaelzhiluo

Running into this for Balsa.

(sky-cmd pid=11365) RuntimeError: Version mismatch: The cluster was started with:
(sky-cmd pid=11365)     Ray: 1.10.0
(sky-cmd pid=11365)     Python: 3.9.4
(sky-cmd pid=11365) This process on node 172.31.65.254 was started with:
(sky-cmd pid=11365)     Ray: 1.9.2
(sky-cmd pid=11365)     Python: 3.7.12

The former is Sky's remote Ray version (controlled) + Python version (uncontrolled by us, depends on the AMI!). The latter is this task's activated conda environment.

For some reason, I didn't see this error on this same cluster before today.

concretevitamin avatar Apr 15 '22 17:04 concretevitamin

~Update: To get around the above, I had to manually make the Sky task's conda env use the cluster's Ray/python versions.~

Scratch that. This remains a problem. After I made the Sky task's conda env use Python 3.9.4, installing requirements.txt failed due to an old dep torch==1.4.0 not being supportd.

Update: manually got the task running by installing Ray + Sky inside the Sky task conda env.

concretevitamin avatar Apr 23 '22 18:04 concretevitamin

This error

RuntimeError: Version mismatch: The cluster was started with:
    Ray: 1.10.0
    Python: 3.9.4
This process on node 192.168.15.204 was started with:
    Ray: 1.10.0
    Python: 3.9.12

is run into again by @pounde. After a sky.exec() that runs sudo conda install --file <file>, the system python was changed from 3.9.4 (which was used to launch Sky/Ray runtime) to 3.9.12.

concretevitamin avatar Jul 18 '22 17:07 concretevitamin

I can confirm that installing the packages into a new environment nagates the problem ie, conda create -n --file <req-file.txt>

pounde avatar Jul 18 '22 18:07 pounde