[AWS VM Cluster Launcher] AWS Cluster launcher installs nightly Ray by default
What happened + What you expected to happen
At the moment, the AWS VM Cluster launcher is installing nightly Ray, instead of the latest released version by default. This is pretty bad because nightly Ray versions didn't go through the release process yet so they are not sufficiently tested.
See https://github.com/ray-project/ray/blob/4f754040764353b3e555327ef19c6778d4fe317b/python/ray/autoscaler/aws/defaults.yaml#L127
Theoretically this should be the "latest" released version but in practice it is an unreleased version (currently the latest master, you can see this by e.g. downloading the wheel and looking at the __commit__ field in __init__.py, which at the time of writing is f34cfcdd2d004a5b97a8c674023eecebed3567ce.
EDIT: It seems the convention we use here at the moment is that latest is a nightly, see the links in https://docs.ray.io/en/latest/ray-overview/installation.html#daily-releases-nightlies -- so we should fix the defaults.yaml to install the latest release.
Versions / Dependencies
Ray 2.4
Reproduction script
N/A
Issue Severity
High: It blocks me from completing my task.
@wuisawesome The explanation makes sense but it looks to me like there's a small chance it's intentional, can you confirm the explanation? Also, is it true that defaults.yaml is used to populate all of the unspecified fields in any user's YAML (for example example-minimal.yaml)?
Yes, defaults is used to populate all unspecified fields.
I think probably this is historical (since Ray wasn't stable before). Given that the default docker image we recommend is 'latest', i would recommend updating the line to just say 'pip install "ray[default]"'
Thanks, that clarifies things completely!
@pcmoritz are you running ray up from a nightly wheel or from a released wheel?
I checked that the offending line is still there on the release branch (for Ray 2.4 at least.)
I chatted with @can-anyscale about this and he should provide his opinion. Generally, I think there's 2 scenarios we should consider.
- Running
ray upwhere my laptop has a nightly commit. In this case, I think the remote cluster should be a nightly commit since Ray CLI is not forwards-compatible. (e.g. if we introduce a new flag thatray upuses, the default release won't have that flag yet, so we'll be broken). - Running
ray upwhere my laptop as the default version of ray (currently 2.4.0). In this case the two options are for the remote cluster to have either default or nightly ray version.
In the case of (2), I think the main benefit of using default is that it is a little more stable since we run more tests against default versions than nightly versions.
There's also probably an argument that ray up example_minimal.yaml should be reproducible. That would also have some implications about which ray version we should launch, but that's a whole different undertaking and there are tradeoffs about whether we'd want to do that.
We've now chatted offline about it with @aslonnie too, and it seems like a solution we are all happy with right now is that for wheels that CI builds, we should have the property that whatever version of ray the cli is using in ray up is the version of ray that the cluster will run.
e.g. ray up on 2.4.0 will launch a 2.4.0 cluster, ray up on 2.3.0 will launch a 2.3 cluster, ray up on master@
To implement this, we can probably just have CI encode the branch into the wheel similar to how to commit is encoded.
I can chat with you about implementation @architkulkarni if you're ok with this at a high level.
This P2 issue has seen no activity in the past 2 years. It will be closed in 2 weeks as part of ongoing cleanup efforts.
Please comment and remove the pending-cleanup label if you believe this issue should remain open.
Thanks for contributing to Ray!