Zach Mueller
Zach Mueller
Let me know if that does anything please 🙏
Oh boy, okay. Well, that was a thought 😢 CC @stas00 if I misunderstood anything? (See the nccl issue)
How are you launching the python script in your bash?
You're launching with python. You should use either `accelerate launch` or `torch.distributed.run` otherwise you'll get model parallel (which isn't what you're aiming for)
Accelerate should handle most of this now, cc @SunMarc if you want to give this a try!
@janboeye yes PyTorch does not have mixed precision support on MPS at this time
Overall I can see this being a pretty nice idea. Made some nits to improve. cc @SunMarc for your thoughts as well :)
At this time we do not support multiple models with `deepspeed`, please see: https://github.com/huggingface/accelerate/issues/2496