Eric Mitchell
Eric Mitchell
Got it- that's what I thought, but I wasn't sure how to let FSDP instantiate the model on several devices when doing the initial load on a meta device +...
@sgugger @pacman100 After some more experimentation, I think this *almost* gets the job done: ``` with init_empty_weights(): policy = transformers.AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-6.9b", cache_dir=get_cache_dir()) def reset_parameters(self) -> None: pass # dummy function for...
To update on this, I think with regular torch FSDP memory-efficient FSDP initialization is possible by: - Only loading model parameters on the rank 0 device (load on the meta...
@pacman100 sorry for the slow reply. I don't actually have a working example- this was just a hypothesis inspired by [this comment](https://github.com/pytorch/pytorch/blob/bffcfa9628d4c8e858ef5f2aeab34e021885e682/torch/distributed/fsdp/api.py#L302) in the PyTorch source. I will look into...
@pacman100 Sorry again for the delay. For an example of this approach and discussion of some of the issues with it, check out https://github.com/pytorch/pytorch/issues/104026 on the PyTorch Github issues. I...
@ramit-wandb just wanted to check if there was any update to this issue! Thanks a lot.
Got it- maybe the blog post/docs [here](https://wandb.ai/stacey/nlg/reports/Tables-Tutorial-Visualize-Text-Data-Predictions---Vmlldzo1NzcwNzY) should be updated if this feature doesn't work for now, then?
Is there any update to this PR? Currently struggling to gracefully respond to CUDA OOM issues in a Jupyter notebook, and this fix looks promising.
Would love to know if there is any update on this issue @BlackSamorez. `tensor_parallel` works great for us for training (nice job!), but the inability to actually sample from the...
Hi! Sorry for the delayed response. If you run the eval scripts, you should get the same samples that we did, since the random seed is set. Or am I...