ray icon indicating copy to clipboard operation
ray copied to clipboard

[Cloud] AWS: When using spot instances, always single availability zone is selected

Open iirekm opened this issue 2 years ago • 2 comments

What happened + What you expected to happen

With config region: us-east-1 - always the last AZ is selected (us-east-1f) for AWS spot request.
When I list all AZs manually (availability_zone: us-east-1d,us-east-1e,us-east-1f,us-east-1a,us-east-1b,us-east-1c) - always first is selected!

Expected behavior should be to select ALL availability zones. Maybe it's less important for on demand instances, but often spot instances aren't available in a zone, but are available in others (especially when it comes to GPUs).

Versions / Dependencies

recent

Reproduction script

Issue Severity

Medium: It is a significant difficulty but I can work around it.

iirekm avatar Apr 28 '22 19:04 iirekm

I found a workaround: go to https://aws.amazon.com/ec2/spot/instance-advisor/ and find instance type that is least likely to be interrupted. But anyway multi AZ support would be useful, because single AZs sometimes fail.

iirekm avatar Apr 29 '22 04:04 iirekm

I'll add a heavy ➕ to this ticket.

mdagost avatar Aug 10 '22 19:08 mdagost

I suspect the issue here is related to explainability of the aws node provider's actions. When a request fails, it implicitly falls back to other AZs, then only reports the last error, which makes it seem like it only tries a single AZ.

@iirekm @mdagost if either of you are still running into this issue, could you try to check if spot instances are actually available in the other AZs? I suspect they won't be for the reason above.

Note that this is still a usability issue that we should try to fix, just trying to understand the exact issue.

wuisawesome avatar Nov 08 '22 22:11 wuisawesome