skypilot
skypilot copied to clipboard
Support TPU Pod
This PR enables TPU Pod usage. To change from single TPU and TPU pod, user only needs to modify
accelerators: tpu-v2-8
to accelerators: tpu-v2-32
.
sky launch
and sky exec
will sync filemount/run setup/execute codes on all the TPU pod nodes.
Note: GCP does not support stopping a tpu pod ref.
=====
TODO:
- [x] Run the MNIST example with
tpu-v2-32
andtpu-v3-32
sky launch examples/tpu/tpuvm_mnist.yaml --gpus tpu-v2-32 -c podv2
sky launch examples/tpu/tpuvm_mnist.yaml --gpus tpu-v3-32 -c podv3
- [ ] Write a tpu pod test