llama.cpp
llama.cpp copied to clipboard
Using non LoRA Alpaca model
The following repo contains a recreation of the original weights for Alpaca, without using LoRA. How could we use that model with this project? https://github.com/pointnetwork/point-alpaca Thanks a bunch!
You should theoretically be able to run the same convert and quantize scripts on that model and use them with llama.cpp.
You should theoretically be able to run the same convert and quantize scripts on that model and use them with llama.cpp.
I tried to convert the recreated weights using the convert script but got the following error:
TypeError: Got unsupported ScalarType BFloat16
Be forewarned, I have absolutely no clue what I'm doing. I'm working on changing this, but in the mean time I - and many others I imagine - would really appreciated guidance from those with the required know-how.
FYI, the reciprocal of this question over at point-alpaca can be found here: https://github.com/pointnetwork/point-alpaca/issues/3
Heya, do you mind laying out the steps you've done to get where you are now? I'm trying to do the same thing but can't get passed the initial making-a-params-json-from-the-config-json hurdle
Sorry but do some one know how to merge the lora to the raw model?
Heya, do you mind laying out the steps you've done to get where you are now? I'm trying to do the same thing but can't get passed the initial making-a-params-json-from-the-config-json hurdle
I just took the "decrypted" pytorch_model-*-of-00003.bin
files, put them in models/7B
, renamed them so that their names would align with what the scripts in this repo expect, and then ran the standard scripts unmodified.
Like I said, I'm pretty clueless when it comes to deep learning and what formats/conventions they use. I'll keep aimlessly banging my head against this until the non LoRA alpaca model works with llama.cpp. gjmulder removed the wontfix
tag which I take as indication that proper support may be implemented.
In the unlikely case that I manage to get it working before official support is added, I promise to post how here.
Heya, I've figured it out! I took Alpaca-LoRA's export_state_dict_checkpoint.py and adapted it a bit to fit our use case! Here's a link to my tweaked version: https://gist.github.com/botatooo/7ab9aa95eab61d1b64edc0263453230a
Steps:
- Download tweaked export_state_dict_checkpoint.py and move it into
point-alpaca
's directory - Run it using
python export_state_dict_checkpoint.py
- Once it's done, you'll want to
- create a new directory, i'll call it
palpaca
- rename
ckpt
to7B
and move it into the new directory - copy
tokenizer.model
from results into the new directory.
- create a new directory, i'll call it
Your directory structure should now look something like this:
chat.py
encrypt.py
[... other point-alpaca files ...]
palpaca/
7B/
consolidated.00.pth
params.json
tokenizer.model
Note: You'll want to wait until https://github.com/ggerganov/llama.cpp/pull/428 gets merged, or fix the quantize script yourself
now you can move palpaca
into the llama.cpp folder and you can quantize it as you usually would
python3 convert-pth-to-ggml.py palpaca/7B/ 1
python3 quantize.py -m palpaca 7B
# start inferencing!
./main -m ./palpaca/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -ins
I hope this was clear enough
./main -m ./palpaca/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -ins
I was with you until this step. I'm receiving the below:
llama_model_load: loading model part 1/1 from './alpaca/7B/ggml-model-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
llama_init_from_file: failed to load model
main: error: failed to load model './alpaca/7B/ggml-model-q4_0.bin'
This issue suggests I should recompile, which I've done. Is the issue with the below?
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
huhhhhhh, I've just tried using lora & it worked, however i got the same error you've gotten when trying with pointalpaca & the tweaked script, so i wonder if it has something to do with the arguments passed to from_pretrained
..
@sr-hm @botatooo
The cause of this is that the point-alpaca model in question has a added "[PAD]" token so the resulting model contains 32001 tokens, but the vocab size was set to 32000, resulting in a mismatch between the tensor shapes and number of tokens. you can edit the params and change it to 32001, but then it crashes (tokenizer.model does not have this token, the value isn't supported because its not divisible by 256 , also i don't think you can have uneven numbers because the bytepair logic needs them to be n / 2?).
i "fixed" it by truncating the shape to 32000 thus yeeting the added token out of existence. it seems to work fine and the token is probably not used anyway, but if it is there is a chance the output could be affected in some way.
ugly hack for point-alpaca: https://gist.github.com/anzz1/6c0b38a1593879065b364bc02f2d3de4
I have developed the following prompt and script use this model:
Text transcript of a never ending dialog, where ${USER_NAME} interacts with an AI assistant named ${AI_NAME}.
${AI_NAME} is helpful, kind, honest, friendly, good at writing and never fails to answer ${USER_NAME}’s requests immediately and with details and precision.
There are no annotations like (30 seconds passed...) or (to himself), just what ${USER_NAME} and ${AI_NAME} say aloud to each other.
The dialog lasts for years, the entirety of it is shared below. It's 10000 pages long.
If you are a doctor, please answer the medical questions based on the patient's description.
Doctor: I am Doctor, what medical questions do you have?
Heya, I've figured it out! I took Alpaca-LoRA's export_state_dict_checkpoint.py and adapted it a bit to fit our use case! Here's a link to my tweaked version: https://gist.github.com/botatooo/7ab9aa95eab61d1b64edc0263453230a
Steps:
Download tweaked export_state_dict_checkpoint.py and move it into
point-alpaca
's directoryRun it using
python export_state_dict_checkpoint.py
Once it's done, you'll want to
- create a new directory, i'll call it
palpaca
- rename
ckpt
to7B
and move it into the new directory- copy
tokenizer.model
from results into the new directory.Your directory structure should now look something like this:
chat.py encrypt.py [... other point-alpaca files ...] palpaca/ 7B/ consolidated.00.pth params.json tokenizer.model
Note: You'll want to wait until #428 gets merged, or fix the quantize script yourself
now you can move
palpaca
into the llama.cpp folder and you can quantize it as you usually wouldpython3 convert-pth-to-ggml.py palpaca/7B/ 1 python3 quantize.py -m palpaca 7B # start inferencing! ./main -m ./palpaca/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -ins
I hope this was clear enough
Amazing! Where did you get tokenizer.model from ?