Sylvain Gugger comments

Results 631 comments of


                                            Sylvain Gugger

Model Parallelism and accelerate's usage of DDP aren't compatible

You can use DDP if your model is only on one device like this.

Model Parallelism and accelerate's usage of DDP aren't compatible

Then you cannot use DDP + `device_map="auto"`. You need to use DeepSpeed or FSDP.

Model Parallelism and accelerate's usage of DDP aren't compatible

I feel like you are not listening. You cannot use `DDP + device_map="auto" ` and thus not `DDP + device_map="auto" + DeepSpeep` either. You need to just use DeepSpeed ZeRO-3...

Model Parallelism and accelerate's usage of DDP aren't compatible

As long as you properly configure DeepSpeed ZeRO-3, you won't need to use `device_map="auto"` yes, and the model will be loaded on several GPUs (each weight will be split).

My validation set is too large. I want to randomly sample 0.1 during validation.

This is linked to your dataset, not Accelerate.

Loading the entire dataset into memory

You can use the `dispatch_batches=True` option and only load your dataset in the process 0 (loading something with the same length but no real samples in them in the other...

Failing to load model using accelerate launch

The machine you are using lacks the necessary amount of RAM to load the model in the 4 processes at the same time (you need 4 times the amount of...

Failing to load model using accelerate launch

Yes for DDP you need 4x the size of the model if you have 4 GPUs in CPU RAM. Not that training with Adam usually requires 4x the size of...

Failing to load model using accelerate launch

The sharding stategy is optimizer only though, from what I see in your Training arguments. You need to shard the model as well (cc @pacman100 who will know more on...

Can't use accelerate to launch two programs on one machine

cc @muellerzr