code-generator icon indicating copy to clipboard operation
code-generator copied to clipboard

Add support for TPU devices

Open vfdev-5 opened this issue 2 years ago • 18 comments

Clear and concise description of the problem

It would be good to provide an option to select accelerator as TPU instead of GPU We can also auto-select TPU accelerator if open with Colab + add torch_xla installation steps.

What to do: 0) Try a template with TPUs. Choose distributed training option with 8 processes and spawning option. "Open in colab" one template, for example, vision classification template, install manually torch_xla (see https://colab.research.google.com/drive/1E9zJrptnLJ_PKhmaP5Vhb6DTVRvyrKHx) and run the code with backend xla-tpu: python main.py --nproc_per_node 8 --backend nccl. If everything is correctly done, training should probably run

  1. Update UI
  • Add a drop-out menu for backend selection: "nccl" and "xla-tpu" in "Training Options"
  • when user selects "xla-tpu", training should be only distributed with 8 processes and "Run the training with torch.multiprocessing.spawn".
  1. Update content: README.md and other impacted files
  2. if exported to Colab, we need to make sure that accelerator is "TPU"

Suggested solution

Alternative

Additional context

vfdev-5 avatar Jul 20 '21 22:07 vfdev-5