llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

is it possible to use llama,cpp with other neural networks?

Open dbpaul opened this issue 1 year ago • 2 comments

I have no clue about this, but I saw that chatglm-6b was published, which should run on CPU with 16GB ram, albeit very slow. https://huggingface.co/THUDM/chatglm-6b/tree/main

Would it be possible to substitute the llama model?

dbpaul avatar Mar 15 '23 09:03 dbpaul

if you are going to write the code in c base on ggml , yes. Also please move to https://github.com/ggerganov/llama.cpp/discussions

v3ss0n avatar Mar 15 '23 19:03 v3ss0n

So far 10 different models are supported across 5 different architectures (including OpenAssistant and Open-Chat-Kit models) are supported by nolanoorg/cformers.

You can now interface with the models with just 3 lines of code from python.

from interface import AutoInference as AI
ai = AI('OpenAssistant/oasst-sft-1-pythia-12b')
x = ai.generate("<|prompter|>What's the Earth total population<|endoftext|><|assistant|>", num_tokens_to_generate=100); print(x['token_str'])

Generation speed is same as this repo (75 ms/token for 12B model on Macbook Pro)

Ayushk4 avatar Mar 25 '23 17:03 Ayushk4