Zach Mueller comments

Results 472 comments of


                                            Zach Mueller

[docs] add xpu part and fix bug in `torchrun`

@faaany lmk if this is good to merge

Use accelerate on both CPUs and GPUs.

If you mean for training, no that's not supported.

Use accelerate on both CPUs and GPUs.

We support that via DeepSpeed/FSDP weight offloading. We're looking into native pipeline parallelism soon

Use accelerate on both CPUs and GPUs.

No, we do not.

not able to launch multi-node training

Is that the full error trace? It seems like some of it may be cut off

how to set `num_processes` in multi-node training

It is total number of GPUs, we then reduce it by `num_machines`. (That SLURM example looks to be wrong possibly)

how to set `num_processes` in multi-node training

I'm stating the launcher will reduce it. `--num_processes` is the *total* number of GPUs and assumes each node has the same number of GPUs on each. So rather than `--n-proc-per-node=2`...

how to set `num_processes` in multi-node training

`num_processes` and `process_index` get their information from `torch.distributed.get_world_size()` and `torch.distributed.get_rank()` `if accelerator.is_main_process` should only run on the main node and its first process. `is_local_main_process` would be ran 4 times, one...

accelerate deepspeed stage 3 numprocess=2 : RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

Please provide us your code and the full stack trace/error log

Feature Request: Support MS-AMP

Might be good to have this as an alternative choice, from their docs: MS-AMP has the following benefit comparing with Transformer Engine: Speed up memory-limited operations by accessing one byte...