torchchat icon indicating copy to clipboard operation
torchchat copied to clipboard

[Feature request] Make GGUF load lazy

Open metascroy opened this issue 10 months ago • 2 comments

When calling generate with a pte or dso, a gguf -path is passed to initialize the model, which is only used to get the weights. For checkpoints, this is OK because they are loaded lazily with mmap, but with gguf, it actually loads all the weights into memory, which are then ignored because the pte/dso is used.

Make state_dict returned by gguf loader lazy.

Note that this is not an issue on export or eager generate.

metascroy avatar Apr 18 '24 15:04 metascroy