ggml
ggml copied to clipboard
Support for bigcode/starcoder
Hi!
I saw the example for the bigcode/gpt_bigcode-santacoder
model. I am wondering how I can run the bigcode/starcoder
model on CPU with a similar approach.
When I run the following command:
python examples/starcoder/convert-hf-to-ggml.py bigcode/starcoder
I encountered this error:
OSError: Consistency check failed: file should be of size 9904379239 but has size 2282030570 ((…)l-00001-of-00007.bin).
Any ideas or help would be greatly appreciated.
% git diff
diff --git a/examples/starcoder/main.cpp b/examples/starcoder/main.cpp
index c9d1d7e..1972732 100644
--- a/examples/starcoder/main.cpp
+++ b/examples/starcoder/main.cpp
@@ -18,11 +18,11 @@
// https://huggingface.co/bigcode/gpt_bigcode-santacoder/blob/main/config.json
struct starcoder_hparams {
int32_t n_vocab = 49280;
- int32_t n_ctx = 2048;
- int32_t n_embd = 2048;
- int32_t n_head = 16;
- int32_t n_layer = 24;
- int32_t ftype = 1;
+ int32_t n_ctx = 8192;
+ int32_t n_embd = 6144;
+ int32_t n_head = 48;
+ int32_t n_layer = 40;
+ int32_t ftype = 1;
};
struct starcoder_layer {
diff --git a/examples/starcoder/quantize.cpp b/examples/starcoder/quantize.cpp
index 101af50..09111bb 100644
--- a/examples/starcoder/quantize.cpp
+++ b/examples/starcoder/quantize.cpp
@@ -16,11 +16,11 @@
// default hparams (GPT-2 117M)
struct starcoder_hparams {
int32_t n_vocab = 49280;
- int32_t n_ctx = 2048;
- int32_t n_embd = 2048;
- int32_t n_head = 16;
- int32_t n_layer = 24;
- int32_t ftype = 1;
+ int32_t n_ctx = 8192;
+ int32_t n_embd = 6144;
+ int32_t n_head = 48;
+ int32_t n_layer = 40;
+ int32_t ftype = 1;
};
// quantize a model
Converted and quantized models can be found here: https://huggingface.co/NeoDim/starcoder-GGML https://huggingface.co/NeoDim/starcoderbase-GGML https://huggingface.co/NeoDim/starchat-alpha-GGML
@s-kostyaev I don't think you need this patch - the correct parameters are loaded from the model file
@ggerganov Ok. Why it doesn't work for @seyyedaliayati ?
@ggerganov you are right. Without patch all works fine. Thank you for information.
he correct parameters are loaded from the model file
So, why I get this error? If you need more information, please let me know. Thanks.
I just re-run and now I get this:
(base) ali@host:~/ggml$ python examples/starcoder/convert-hf-to-ggml.py bigcode/starcoder
Loading model: bigcode/starcoder
Downloading (…)l-00001-of-00007.bin: 100%|█████████████████████████████████████████| 9.90G/9.90G [05:09<00:00, 32.0MB/s]
Downloading (…)l-00002-of-00007.bin: 100%|█████████████████████████████████████████| 9.86G/9.86G [04:51<00:00, 33.8MB/s]
Downloading (…)l-00003-of-00007.bin: 100%|█████████████████████████████████████████| 9.85G/9.85G [04:59<00:00, 32.9MB/s]
Downloading (…)l-00004-of-00007.bin: 100%|█████████████████████████████████████████| 9.86G/9.86G [04:53<00:00, 33.6MB/s]
Downloading (…)l-00005-of-00007.bin: 100%|█████████████████████████████████████████| 9.85G/9.85G [04:55<00:00, 33.3MB/s]
Downloading (…)l-00006-of-00007.bin: 100%|█████████████████████████████████████████| 9.86G/9.86G [04:58<00:00, 33.1MB/s]
Downloading (…)l-00007-of-00007.bin: 100%|█████████████████████████████████████████| 4.08G/4.08G [01:56<00:00, 34.9MB/s]
Downloading shards: 100%|████████████████████████████████████████████████████████████████| 7/7 [31:46<00:00, 272.30s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 7/7 [03:42<00:00, 31.77s/it]
Killed
(base) ali@host:~/ggml$ python examples/starcoder/convert-hf-to-ggml.py bigcode/starcoder
Loading model: bigcode/starcoder
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 7/7 [03:38<00:00, 31.20s/it]
Killed
- Is it because I am running on WSL?
I'm afraid you may be lacking system memory for the conversion.
I'm afraid you may be lacking system memory for the conversion.
You are right. I have increased my RAM and issue solved!