johnson442
johnson442
WizardCoder-15B-1.0.ggmlv3.q5_1.bin works fine for me using the starcoder ggml example: https://github.com/ggerganov/ggml/tree/master/examples/starcoder. Llama.cpp doesn't support it yet.
Are you monitoring memory use when you run starcoder? Running the 14.3GB Q5_1 with 32GB of ram: ``` PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND...
@spikespiegel I cobbled together basic mmap (and gpu) support for the starcoder example if you'd like to test: https://github.com/johnson442/ggml/tree/starcoder-mmap There is probably something wrong with it, but it seems to...
I needed to add `#include ` to train-text-from-scratch.cpp due to: ``` /tmp/mount/train-git/llama.cpp/examples/train-text-from-scratch/train-text-from-scratch.cpp:1512:37: error: ‘INT_MAX’ was not declared in this scope 1512 | GGML_ASSERT(size >= 0 && size < INT_MAX); ```...
Your first run is converting to q4_1 not q4_k_s. If you built using cmake from current master there is no k_quants support, you can either add a line to CMakelists...
Awesome! Is there a discussion somewhere about what shape adding new models to llama.cpp is going to take? I thought about making this PR against that repo but wasn't sure...
> ./bin/starcoder -ngl 20 -t 24 -b 64 -m /data/WizardCoder-15B-1.0-GGML/WizardCoder-15B-1.0.ggmlv3.q4_0.bin Run ./starcoder-mmap if you have built this branch.
When I try your prompt and parameters using either starcoder or starcoder-mmap the output is , so appears unrelated to these changes. You can try a positive temperature if you...