kuberay
kuberay copied to clipboard
Add Fake TPU e2e Autoscaling Test Cases
Why are these changes needed?
This PR adds a fake TPU test case, similar to the existing fake GPU test case for autoscaling, that uses detached actors to verify that single-host and multi-host TPU autoscaling behave as expected. The behaviors tested included:
- (1) Creating a detached actor that requests
resources: {"TPU": 4}
will scale up a Ray TPU worker - (2) For a multi-host worker group, the number of workers created should equal
replicas * numOfHosts
- (3) Terminating detached actors scheduled on a multi-host worker group replica will cause the entire replica to be scaled down
Edit: Removed test behavior for idle nodes being scaled down, since this requires setting the timeout value to a much higher value and scaling down of multi-host replicas is still tested.
Related issue number
Checks
- [x] I've made sure the tests are passing.
- Testing Strategy
- [x] Unit tests
- [x] Manual tests
- [ ] This PR is not tested :(