llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Wizard Coder 15b Support?

Open Asory2010 opened this issue 1 year ago • 9 comments

I have tried running the GGML version of it but it gives this error:

main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --repeat_penalty 1.2 --instruct --color --memory_f32 -m WizardCoder-15B-1.0.ggmlv3.q4_0.bin main: build = 686 (ac3b886) main: seed = 1686975019 ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4050 Laptop GPU llama.cpp: loading model from WizardCoder-15B-1.0.ggmlv3.q4_0.bin error loading model: missing tok_embeddings.weight llama_init_from_file: failed to load model

Asory2010 avatar Jun 16 '23 21:06 Asory2010

WizardCoder-15B-1.0.ggmlv3.q5_1.bin works fine for me using the starcoder ggml example: https://github.com/ggerganov/ggml/tree/master/examples/starcoder.

Llama.cpp doesn't support it yet.

johnson442 avatar Jun 16 '23 21:06 johnson442

but it is not llama.cpp ;)

mirek190 avatar Jun 16 '23 22:06 mirek190

can anyone explain how to use other model such as WizardVicuna with this privateGPT is that model supported?

giridharreddy7 avatar Jun 17 '23 00:06 giridharreddy7

WizardCoder-15B-1.0.ggmlv3.q5_1.bin works fine for me using the starcoder ggml example: https://github.com/ggerganov/ggml/tree/master/examples/starcoder.

Llama.cpp doesn't support it yet.

I cannot make it work with starcoder.cpp. I downloaded the 4-bit ggml model from huggingface, but it gives ggml error. ./main -m ./models/WizardCoder-15B-1.0.ggmlv3.q4_1.bin -p "def fibonacci(" --temp 0.2

Error: main: seed = 1686965178 starcoder_model_load: loading model from './models/WizardCoder-15B-1.0.ggmlv3.q4_1.bin' starcoder_model_load: n_vocab = 49153 starcoder_model_load: n_ctx = 8192 starcoder_model_load: n_embd = 6144 starcoder_model_load: n_head = 48 starcoder_model_load: n_layer = 40 starcoder_model_load: ftype = 2003 starcoder_model_load: qntvr = 2 starcoder_model_load: ggml ctx size = 28956.48 MB GGML_ASSERT: ggml.c:3874: ctx->mem_buffer != NULL Aborted (core dumped)

More information: I have 16 GB RAM, and the model is about 11 GB, so it should probably be fit into the memory, if that was the issue?

It may not be the place to ask, but now that you said you can run it, can you give me some help/reference what is going on?

spikespiegel avatar Jun 17 '23 01:06 spikespiegel

Are you monitoring memory use when you run starcoder? Running the 14.3GB Q5_1 with 32GB of ram:

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                        
  24811 root      20   0   29.1g  13.4g   4352 R 397.3  42.8   1:03.45 starcoder 

From

starcoder_model_load: ggml ctx size = 28956.48 MB GGML_ASSERT: ggml.c:3874: ctx->mem_buffer != NULL

seems pretty likely you are running out of memory.

I dont think any of the mmap magic in llamacpp has made it into ggml yet.

johnson442 avatar Jun 17 '23 02:06 johnson442

Thanks for the reply. Yes, the model does not fit into the memory, it seems. I assumed the model would fit into the Ram since it is smaller, but it seems that is not the case with ggml. Good to know, thanks!

spikespiegel avatar Jun 17 '23 02:06 spikespiegel

When would it be supported????

Asory2010 avatar Jun 17 '23 12:06 Asory2010

That model for coding is better than anything offline so far. It is level of gpt 3.5.

mirek190 avatar Jun 24 '23 22:06 mirek190

Wizard Coder 15b is no more LLAMA family model. its graph has several different nodes than LLAMA models.

howard0su avatar Jun 26 '23 15:06 howard0su

@spikespiegel I cobbled together basic mmap (and gpu) support for the starcoder example if you'd like to test: https://github.com/johnson442/ggml/tree/starcoder-mmap

There is probably something wrong with it, but it seems to run ok for me on a system with 16GB of ram.

johnson442 avatar Jun 30 '23 00:06 johnson442