Is there support for loading a sharded gguf file ?
Is your feature request related to a problem? Please describe. Inquiring whether this project supports loading a "sharded" gguf model file ? The llama cpp project appears to add tooling for splitting gguf files into pieces (more here). Was curious of the this project supports loading gguf files in that format since I didn't see any mention of it in the documentation or issues.
If it is supported, could you point me to the documentation on this or provide a code example ? If not, perhaps this feature could be added ?
@jharonfe I haven't tested it personally but according to the linked discussion it's automatically detected by llama_load_model_from_file which llama-cpp-python uses.
One caveat is that this probably doesn't work with .from_pretrained yet because that method looks for a single file to pull via the huggingface_hub library. I think adding an option like additional_files there would be good, I'll look into it.
One thing that would help is a link to a small model that's been split and uploaded, preferrably <7b.
I tested this using the latest version, 0.2.61, and the model appears to load correctly. Thanks for the feedback on this.
I just hit this issue today. I had tried using a wild card to specify all of the files, but then had it complain about seeing multiple files. An option like additional_files would be a nice quality of life change.
Prototype: https://github.com/abetlen/llama-cpp-python/commit/0e67a832bdda3d7e73cbcb162e93d5455503ca57 Will test and open PR if it works.
PR opened: https://github.com/abetlen/llama-cpp-python/pull/1457
i would appreciate this feature! currently, i do something very hacky like below and it works (once all files are downloaded), but it's pretty clunky
from llama_cpp import Llama
for i in range(1, 6+1): # assuming the shards are split into 6
try:
llm = Llama.from_pretrained(
repo_id="MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4-GGUF",
filename=f"*IQ4_XS-{i:05d}*",
verbose=True,
n_gpu_layers=-1,
)
except:
pass
i would appreciate this feature! currently, i do something very hacky like below and it works (once all files are downloaded), but it's pretty clunky
Well, my PR has been sitting there for a month now. You might try editing the code additions in yourself to get rid of the jank.