llama.cpp
llama.cpp copied to clipboard
Wizard Coder 15b Support?
I have tried running the GGML version of it but it gives this error:
main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --repeat_penalty 1.2 --instruct --color --memory_f32 -m WizardCoder-15B-1.0.ggmlv3.q4_0.bin main: build = 686 (ac3b886) main: seed = 1686975019 ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4050 Laptop GPU llama.cpp: loading model from WizardCoder-15B-1.0.ggmlv3.q4_0.bin error loading model: missing tok_embeddings.weight llama_init_from_file: failed to load model
WizardCoder-15B-1.0.ggmlv3.q5_1.bin works fine for me using the starcoder ggml example: https://github.com/ggerganov/ggml/tree/master/examples/starcoder.
Llama.cpp doesn't support it yet.
but it is not llama.cpp ;)
can anyone explain how to use other model such as WizardVicuna with this privateGPT is that model supported?
WizardCoder-15B-1.0.ggmlv3.q5_1.bin works fine for me using the starcoder ggml example: https://github.com/ggerganov/ggml/tree/master/examples/starcoder.
Llama.cpp doesn't support it yet.
I cannot make it work with starcoder.cpp. I downloaded the 4-bit ggml model from huggingface, but it gives ggml error. ./main -m ./models/WizardCoder-15B-1.0.ggmlv3.q4_1.bin -p "def fibonacci(" --temp 0.2
Error: main: seed = 1686965178 starcoder_model_load: loading model from './models/WizardCoder-15B-1.0.ggmlv3.q4_1.bin' starcoder_model_load: n_vocab = 49153 starcoder_model_load: n_ctx = 8192 starcoder_model_load: n_embd = 6144 starcoder_model_load: n_head = 48 starcoder_model_load: n_layer = 40 starcoder_model_load: ftype = 2003 starcoder_model_load: qntvr = 2 starcoder_model_load: ggml ctx size = 28956.48 MB GGML_ASSERT: ggml.c:3874: ctx->mem_buffer != NULL Aborted (core dumped)
More information: I have 16 GB RAM, and the model is about 11 GB, so it should probably be fit into the memory, if that was the issue?
It may not be the place to ask, but now that you said you can run it, can you give me some help/reference what is going on?
Are you monitoring memory use when you run starcoder? Running the 14.3GB Q5_1 with 32GB of ram:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24811 root 20 0 29.1g 13.4g 4352 R 397.3 42.8 1:03.45 starcoder
From
starcoder_model_load: ggml ctx size = 28956.48 MB GGML_ASSERT: ggml.c:3874: ctx->mem_buffer != NULL
seems pretty likely you are running out of memory.
I dont think any of the mmap magic in llamacpp has made it into ggml yet.
Thanks for the reply. Yes, the model does not fit into the memory, it seems. I assumed the model would fit into the Ram since it is smaller, but it seems that is not the case with ggml. Good to know, thanks!
When would it be supported????
That model for coding is better than anything offline so far. It is level of gpt 3.5.
Wizard Coder 15b is no more LLAMA family model. its graph has several different nodes than LLAMA models.
@spikespiegel I cobbled together basic mmap (and gpu) support for the starcoder example if you'd like to test: https://github.com/johnson442/ggml/tree/starcoder-mmap
There is probably something wrong with it, but it seems to run ok for me on a system with 16GB of ram.