skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

Support TPU Pod

Open infwinston opened this issue 1 year ago • 0 comments

This PR enables TPU Pod usage. To change from single TPU and TPU pod, user only needs to modify accelerators: tpu-v2-8 to accelerators: tpu-v2-32.

sky launch and sky exec will sync filemount/run setup/execute codes on all the TPU pod nodes.

Note: GCP does not support stopping a tpu pod ref.

=====

TODO:

  • [x] Run the MNIST example with tpu-v2-32 and tpu-v3-32
sky launch examples/tpu/tpuvm_mnist.yaml --gpus tpu-v2-32 -c podv2
sky launch examples/tpu/tpuvm_mnist.yaml --gpus tpu-v3-32 -c podv3
  • [ ] Write a tpu pod test

infwinston avatar Jul 21 '22 03:07 infwinston