llama2.c
llama2.c copied to clipboard
Support GQA export, better run.c, Support tinyllama-1.1B
Add support to tinyllama-1.1B Add support to convert GQA model (learned from https://github.com/ggerganov/llama.cpp/pull/3364) Better run.c
- save a little memory(same as #400)
- make rope part a function
- hardcode to check whether there is a
\n
Current chat schemas in run.c are based on LLama 2
// render user/system prompts into the Llama 2 Chat schema
if (pos == 0 && system_prompt[0] != '\0') {
char system_template[] = "[INST] <<SYS>>\n%s\n<</SYS>>\n\n%s [/INST]";
sprintf(rendered_prompt, system_template, system_prompt, user_prompt);
} else {
char user_template[] = "[INST] %s [/INST]";
sprintf(rendered_prompt, user_template, user_prompt);
}
But you may want to use tinyllama's ones instead:
<|im_start|>user
Explain huggingface.<|im_end|>
<|im_start|>assistant
In general chat templates should be bounded to the loaded pre-trained model, so maybe they should be a configuration parameter in the .bin file
This is cool, I wasn't aware of the TinyLlama 1.1B run. Sounds very nice and useful for this repo to support. Are there any notable architectural changes in it? This PR is a bit of a random combination of necessary differences, and a few side optimizations.
This is cool, I wasn't aware of the TinyLlama 1.1B run. Sounds very nice and useful for this repo to support. Are there any notable architectural changes in it? This PR is a bit of a random combination of necessary differences, and a few side optimizations.
There isn't notable architectural changes.