ray icon indicating copy to clipboard operation
ray copied to clipboard

[core][ci] remove the availability_zone which causes an ssh timeout issue in the aws cluster launcher yaml

Open rueian opened this issue 3 weeks ago • 1 comments

Description

The availability_zone restrictions cause aws_cluster_launcher_full tests to fail with SSH timeouts when the cluster launcher tries to set up the cluster it launched via SSH, because some subnets in our CI environment are private: image

After commenting availability_zone out, the tests now pass https://buildkite.com/ray-project/release/builds/71069. image

Related issues

Fixes https://github.com/anyscale/ray/issues/552

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

rueian avatar Dec 07 '25 07:12 rueian

hm this is a behavior change because IIRC that example defines the defaults for the AWS cluster launcher

could we fix the CI config/setup instead? or override it just for the test

edoakes avatar Dec 09 '25 15:12 edoakes

hm this is a behavior change because IIRC that example defines the defaults for the AWS cluster launcher

could we fix the CI config/setup instead? or override it just for the test

Maybe we can do both. I changed to override it (removing zone restrictions) just for the test here.

rueian avatar Dec 11 '25 03:12 rueian