ctransformers Integration with Llama 2

Has anyone gotten this library running with Llama 2? I can use it for simple Question-Answering flows, but it seems to break down as the prompt gets more complicated or contains more metadata. And in particular, it seems to dislike having numerical input -- it ends up getting repetitive and confused.

Just curious about your setup if so.

Jul 31 '23 22:07 sabaimran

Which model (7B, 13B etc.) and quantization format (Q2, Q4_0, etc.) are you using? If you have enough RAM, you can try using larger models.

Aug 03 '23 21:08 marella

i ran into the same problem, I am using alpaca2-7B 8-bit quantized model : ggml-model-q8_0.bin , it behaved weird , lots of nonsense and the prompt seems not working.

Aug 08 '23 09:08 jamesljl

I am successfully running llama2-7b in GGML (Q4_0 and Q4_1). It's quite fast on CPU and very fast on an A10G (with 24 GB of VRAM) when offloading layers to GPU.

Output is fine, although do note this is a fine-tuned model and not the base one.

Aug 08 '23 11:08 drvenabili

llama2 models are fully supported, I just runed the model llama-2-13b-chat.ggmlv3.q8_0.bin with the following code in my MAC M1, 64GB run and works great:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained('./models/llama-2-13b-chat.ggmlv3.q8_0.bin', 
                                             model_type='llama', 
                                             gpu_layers=50)
prompt = 'What do you know about superconductors?'
for text in llm(prompt, stream=True):
    print(text, end="", flush=True)

Aug 14 '23 18:08 alexshmmy

@jamesljl do you have a link to download model? Also please share the code along with the params you are using.

Aug 15 '23 11:08 marella

Does ctransformers use the correct Llama instruction template? Does it use anything at all or do I have to make sure it's using it myself? I am talking about this:

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

There's a llama in my garden 😱 What should I do? [/INST]

Aug 29 '23 11:08 m-from-space

No, ctransformers doesn't use any prompt template as it depends on the specific model you are using. You should include it in the prompt.

Aug 30 '23 22:08 marella