Zach Mueller

Results 368 comments of Zach Mueller

Hi @jianguoz! Sorry you responded just as I was off for vacation that week. As the directions state you should run `ssh-add ~/.ssh/google_compute_engine` on the machine to get `accelerate tpu-config`...

Hey @jianguoz, glad to hear `accelerate launch` is doing its job and setting that up right and starting training! I'll look into `accelerate tpu-config` tommorow and see if I missed...

You can just not use `accelerate config` in this instance. E.g.: ```bash accelerate launch \ --mixed_precision=bf16 \ --machine_rank 0 \ --num_machines 1 \ --main_process_port 11135 \ --num_processes $GPUS_PER_NODE \ fastcomposer/train.py...

Also, while it is SLURM, if it's just one machine you don't need to add `--machine_rank` and `--num_machines`

If you don't have a config file and just pass in `--multi_gpu` it will be just fine. You also can pass in `--num_processes {x}` which will help. To point to...

Looks to be a similar issue reported on pytorch: https://github.com/pytorch/pytorch/issues/116056 Personally I recommend just using WSL instead

We can look into this, though I don't think we support onnxruntime? Where did you see that? 🤔

This is quite new having colab support multiple GPUs, so thanks for letting us know that this is something that's possible now. This was this behavior before because colab did...

Yes that does indeed need updating. Looking forward to the PR!

The torch_xla team is aware of this and working towards fixing it