ray
ray copied to clipboard
[Cloud] AWS: When using spot instances, always single availability zone is selected
What happened + What you expected to happen
With config region: us-east-1
- always the last AZ is selected (us-east-1f
) for AWS spot request.
When I list all AZs manually (availability_zone: us-east-1d,us-east-1e,us-east-1f,us-east-1a,us-east-1b,us-east-1c
) - always first is selected!
Expected behavior should be to select ALL availability zones. Maybe it's less important for on demand instances, but often spot instances aren't available in a zone, but are available in others (especially when it comes to GPUs).
Versions / Dependencies
recent
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.
I found a workaround: go to https://aws.amazon.com/ec2/spot/instance-advisor/ and find instance type that is least likely to be interrupted. But anyway multi AZ support would be useful, because single AZs sometimes fail.
I'll add a heavy ➕ to this ticket.
I suspect the issue here is related to explainability of the aws node provider's actions. When a request fails, it implicitly falls back to other AZs, then only reports the last error, which makes it seem like it only tries a single AZ.
@iirekm @mdagost if either of you are still running into this issue, could you try to check if spot instances are actually available in the other AZs? I suspect they won't be for the reason above.
Note that this is still a usability issue that we should try to fix, just trying to understand the exact issue.