llama-cpp-python gguf reader for layer and size estimates

i've found that without some sort of layer and size estimate it's very hard to choose the right number of layers to offload

todo:

get a size estimate based on needed context size!

if you think this should be it's own repo, im cool with that

Sep 14 '23 14:09 earonesty

Hey @earonesty this makes sense and I do want to integrate gguf more closely into llama-cpp-python. Is it possible to use the pip published gguf package to reduce the amount of maintenance required when that's updated?

Sep 30 '23 06:09 abetlen

Hey @earonesty this makes sense and I do want to integrate gguf more closely into llama-cpp-python. Is it possible to use the pip published gguf package to reduce the amount of maintenance required when that's updated?

unfortunately that package has no reader support. i used the source for that to reverse engineer the format and write the reader! happy to put it in its own repo, but i dont thnk the llama-cpp team has plans to maintain the reader.

i can try to submit a PR and see if they like it?

Sep 30 '23 17:09 earonesty