alpaca.cpp icon indicating copy to clipboard operation
alpaca.cpp copied to clipboard

how to generate "ggml-alpaca-7b-q4.bin" with LLaMa original "consolidated.00.pth"?

Open aicoat opened this issue 1 year ago • 4 comments

is there any way to generate 7B,13B or 30B instead of downloading it? i already have the original models.

aicoat avatar Mar 25 '23 17:03 aicoat

@aicoat https://github.com/antimatter15/alpaca.cpp/blob/master/convert-pth-to-ggml.py

Qubitium avatar Mar 25 '23 19:03 Qubitium

@diegomontoya already tried that, it will give me 4 parts on 30B named ggml-model-f16.bin to ggml-model-f16.bin.3 i used quantize.exe to convert it to bin but at the end alpaca gave me an error "bad magic"

D:\LLama_git\alpaca-win>chat -m D:\LLama_git\Models\30B\ggml-model-q4_0.bin
main: seed = 1679772113
llama_model_load: loading model from 'D:\LLama_git\Models\30B\ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'D:\LLama_git\Models\30B\ggml-model-q4_0.bin' (bad magic)
main: failed to load model from 'D:\LLama_git\Models\30B\ggml-model-q4_0.bin'

it should be one single bin file not multiple parts

aicoat avatar Mar 25 '23 19:03 aicoat

You will have to re-compile 'chat' with line 34 changed into 4 (instead of 1), which is the number .bin files for your generated 30B model.

Reference: https://github.com/antimatter15/alpaca.cpp/issues/94#issuecomment-1478362711

mastr-ch13f avatar Mar 25 '23 20:03 mastr-ch13f

@mastr-ch13f thanks man , you saved me lots of GB! i changed them to this and it is now working for all models:

// determine number of model parts based on the dimension
static const std::map<int, int> LLAMA_N_PARTS = {
    { 4096, 1 },
    { 5120, 2 },
    { 6656, 4 },
    { 8192, 8 },
};

aicoat avatar Mar 25 '23 21:03 aicoat