llama.cpp Using non LoRA Alpaca model

The following repo contains a recreation of the original weights for Alpaca, without using LoRA. How could we use that model with this project? https://github.com/pointnetwork/point-alpaca Thanks a bunch!

Mar 19 '23 20:03 mjorgers

You should theoretically be able to run the same convert and quantize scripts on that model and use them with llama.cpp.

Mar 20 '23 04:03 thomasantony

You should theoretically be able to run the same convert and quantize scripts on that model and use them with llama.cpp.

I tried to convert the recreated weights using the convert script but got the following error:

TypeError: Got unsupported ScalarType BFloat16

Be forewarned, I have absolutely no clue what I'm doing. I'm working on changing this, but in the mean time I - and many others I imagine - would really appreciated guidance from those with the required know-how.

FYI, the reciprocal of this question over at point-alpaca can be found here: https://github.com/pointnetwork/point-alpaca/issues/3

Mar 20 '23 19:03 clulece

Heya, do you mind laying out the steps you've done to get where you are now? I'm trying to do the same thing but can't get passed the initial making-a-params-json-from-the-config-json hurdle

Mar 20 '23 20:03 adntaha

Sorry but do some one know how to merge the lora to the raw model?

Mar 21 '23 03:03 FNsi

Heya, do you mind laying out the steps you've done to get where you are now? I'm trying to do the same thing but can't get passed the initial making-a-params-json-from-the-config-json hurdle

I just took the "decrypted" pytorch_model-*-of-00003.bin files, put them in models/7B, renamed them so that their names would align with what the scripts in this repo expect, and then ran the standard scripts unmodified.

Like I said, I'm pretty clueless when it comes to deep learning and what formats/conventions they use. I'll keep aimlessly banging my head against this until the non LoRA alpaca model works with llama.cpp. gjmulder removed the wontfix tag which I take as indication that proper support may be implemented.

In the unlikely case that I manage to get it working before official support is added, I promise to post how here.

Mar 22 '23 04:03 clulece

Heya, I've figured it out! I took Alpaca-LoRA's export_state_dict_checkpoint.py and adapted it a bit to fit our use case! Here's a link to my tweaked version: https://gist.github.com/botatooo/7ab9aa95eab61d1b64edc0263453230a

Steps:

Download tweaked export_state_dict_checkpoint.py and move it into point-alpaca's directory
Run it using python export_state_dict_checkpoint.py
Once it's done, you'll want to
- create a new directory, i'll call it palpaca
- rename ckpt to 7B and move it into the new directory
- copy tokenizer.model from results into the new directory.

Your directory structure should now look something like this:

chat.py
encrypt.py
[... other point-alpaca files ...]
palpaca/
    7B/
        consolidated.00.pth
        params.json
    tokenizer.model

Note: You'll want to wait until https://github.com/ggerganov/llama.cpp/pull/428 gets merged, or fix the quantize script yourself

now you can move palpaca into the llama.cpp folder and you can quantize it as you usually would

python3 convert-pth-to-ggml.py palpaca/7B/ 1
python3 quantize.py -m palpaca 7B

# start inferencing!
./main -m ./palpaca/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -ins

I hope this was clear enough

Mar 23 '23 20:03 adntaha

./main -m ./palpaca/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -ins

I was with you until this step. I'm receiving the below:

llama_model_load: loading model part 1/1 from './alpaca/7B/ggml-model-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
llama_init_from_file: failed to load model
main: error: failed to load model './alpaca/7B/ggml-model-q4_0.bin'

This issue suggests I should recompile, which I've done. Is the issue with the below?

llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1

Mar 24 '23 03:03 srhm-ca

huhhhhhh, I've just tried using lora & it worked, however i got the same error you've gotten when trying with pointalpaca & the tweaked script, so i wonder if it has something to do with the arguments passed to from_pretrained..

Mar 25 '23 04:03 adntaha

@sr-hm @botatooo

The cause of this is that the point-alpaca model in question has a added "[PAD]" token so the resulting model contains 32001 tokens, but the vocab size was set to 32000, resulting in a mismatch between the tensor shapes and number of tokens. you can edit the params and change it to 32001, but then it crashes (tokenizer.model does not have this token, the value isn't supported because its not divisible by 256 , also i don't think you can have uneven numbers because the bytepair logic needs them to be n / 2?).

i "fixed" it by truncating the shape to 32000 thus yeeting the added token out of existence. it seems to work fine and the token is probably not used anyway, but if it is there is a chance the output could be affected in some way.

ugly hack for point-alpaca: https://gist.github.com/anzz1/6c0b38a1593879065b364bc02f2d3de4

Mar 25 '23 07:03 anzz1

I have developed the following prompt and script use this model:

 Text transcript of a never ending dialog, where ${USER_NAME} interacts with an AI assistant named ${AI_NAME}.
${AI_NAME} is helpful, kind, honest, friendly, good at writing and never fails to answer ${USER_NAME}’s requests immediately and with details and precision.
There are no annotations like (30 seconds passed...) or (to himself), just what ${USER_NAME} and ${AI_NAME} say aloud to each other.
The dialog lasts for years, the entirety of it is shared below. It's 10000 pages long.
If you are a doctor, please answer the medical questions based on the patient's description.

Doctor: I am Doctor, what medical questions do you have?

chat-doctor.tar.gz

Apr 02 '23 08:04 xor2003

Heya, I've figured it out! I took Alpaca-LoRA's export_state_dict_checkpoint.py and adapted it a bit to fit our use case! Here's a link to my tweaked version: https://gist.github.com/botatooo/7ab9aa95eab61d1b64edc0263453230a

Steps:

Download tweaked export_state_dict_checkpoint.py and move it into point-alpaca's directory

Run it using python export_state_dict_checkpoint.py

Once it's done, you'll want to

create a new directory, i'll call it palpaca

rename ckpt to 7B and move it into the new directory

copy tokenizer.model from results into the new directory.

Your directory structure should now look something like this:
chat.py
encrypt.py
[... other point-alpaca files ...]
palpaca/
    7B/
        consolidated.00.pth
        params.json
    tokenizer.model
Note: You'll want to wait until #428 gets merged, or fix the quantize script yourself

now you can move palpaca into the llama.cpp folder and you can quantize it as you usually would
python3 convert-pth-to-ggml.py palpaca/7B/ 1
python3 quantize.py -m palpaca 7B

# start inferencing!
./main -m ./palpaca/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -ins
I hope this was clear enough

Amazing! Where did you get tokenizer.model from ?

May 23 '23 08:05 larawehbe

llama.cpp llama.cpp copied to clipboard

Using non LoRA Alpaca model

llama.cpp
llama.cpp copied to clipboard