accelerate
accelerate copied to clipboard
Feature Request: Device mapping for models that aren't sharded
If it's possible, this feature would be nice for loading models that overload the CPU and GPU ram alone, but with device mapping the model will be split across both mediums, and even the hard drive if needed.
Yes, that's exactly what Accelerate does for both sharded and non-sharded models. Not sure what feature you feel is missing, could you share some code?
I apologize for the delay, but this should capture all of my issues. I'm not sure if it's an issue with the model I'm using, but GPT-Neo barely touches GPU mem, but sharded OPT with the same parameter amount works just fine. Not sure if this is a bad example, but just take a look. https://colab.research.google.com/gist/JD-The-65th/3df3077443d48b2015b18c8ca9e0cc70/accelerate_opt.ipynb#scrollTo=Q0Zf_d5RhVpO
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.