accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

Why is `offload_folder` needed for inference of large models?

Open lostmsu opened this issue 3 years ago • 3 comments

It does not make sense to me. Why does it load the model first, then offloads some of its weights into files in a different place? Why not just load the weights as needed from the original files or at least have an option to do so?

lostmsu avatar Jul 23 '22 19:07 lostmsu

That's because PyTorch does not let you load an individual weight from a state dict because they pickle the whole thing.

sgugger avatar Jul 26 '22 11:07 sgugger

Sounds like it would make sense to distribute larger models in binary format instead of PyTorch format. Right now offload scenarios double the storage requirements due to that issue.

lostmsu avatar Jul 26 '22 16:07 lostmsu

Alternatively, could you make it possible to download the model in PyTorch format, then convert it to binary for accelerate to read directly from "offload_folder"?

lostmsu avatar Aug 04 '22 16:08 lostmsu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Aug 29 '22 15:08 github-actions[bot]