llama2.c icon indicating copy to clipboard operation
llama2.c copied to clipboard

Support GQA export, better run.c, Support tinyllama-1.1B

Open magician-blue opened this issue 2 years ago • 3 comments

Add support to tinyllama-1.1B Add support to convert GQA model (learned from https://github.com/ggerganov/llama.cpp/pull/3364) Better run.c

  • save a little memory(same as #400)
  • make rope part a function
  • hardcode to check whether there is a \n

magician-blue avatar Sep 27 '23 16:09 magician-blue

Current chat schemas in run.c are based on LLama 2

            // render user/system prompts into the Llama 2 Chat schema
            if (pos == 0 && system_prompt[0] != '\0') {
                char system_template[] = "[INST] <<SYS>>\n%s\n<</SYS>>\n\n%s [/INST]";
                sprintf(rendered_prompt, system_template, system_prompt, user_prompt);
            } else {
                char user_template[] = "[INST] %s [/INST]";
                sprintf(rendered_prompt, user_template, user_prompt);
            }

But you may want to use tinyllama's ones instead:

<|im_start|>user
Explain huggingface.<|im_end|>
<|im_start|>assistant

In general chat templates should be bounded to the loaded pre-trained model, so maybe they should be a configuration parameter in the .bin file

xefoci7612 avatar Oct 03 '23 05:10 xefoci7612

This is cool, I wasn't aware of the TinyLlama 1.1B run. Sounds very nice and useful for this repo to support. Are there any notable architectural changes in it? This PR is a bit of a random combination of necessary differences, and a few side optimizations.

karpathy avatar Oct 09 '23 15:10 karpathy

This is cool, I wasn't aware of the TinyLlama 1.1B run. Sounds very nice and useful for this repo to support. Are there any notable architectural changes in it? This PR is a bit of a random combination of necessary differences, and a few side optimizations.

There isn't notable architectural changes.

magician-blue avatar Oct 11 '23 05:10 magician-blue