accelerate Why is `offload_folder` needed for inference of large models?

Why is `offload_folder` needed for inference of large models?

Open lostmsu opened this issue 3 years ago • 3 comments

It does not make sense to me. Why does it load the model first, then offloads some of its weights into files in a different place? Why not just load the weights as needed from the original files or at least have an option to do so?

Jul 23 '22 19:07 lostmsu

That's because PyTorch does not let you load an individual weight from a state dict because they pickle the whole thing.

Jul 26 '22 11:07 sgugger

Sounds like it would make sense to distribute larger models in binary format instead of PyTorch format. Right now offload scenarios double the storage requirements due to that issue.

Jul 26 '22 16:07 lostmsu

Alternatively, could you make it possible to download the model in PyTorch format, then convert it to binary for accelerate to read directly from "offload_folder"?

Aug 04 '22 16:08 lostmsu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Aug 29 '22 15:08 github-actions[bot]

accelerate accelerate copied to clipboard

Why is `offload_folder` needed for inference of large models?

accelerate
accelerate copied to clipboard