llama.cpp
llama.cpp copied to clipboard
is it possible to use llama,cpp with other neural networks?
I have no clue about this, but I saw that chatglm-6b was published, which should run on CPU with 16GB ram, albeit very slow. https://huggingface.co/THUDM/chatglm-6b/tree/main
Would it be possible to substitute the llama model?
if you are going to write the code in c base on ggml , yes. Also please move to https://github.com/ggerganov/llama.cpp/discussions
So far 10 different models are supported across 5 different architectures (including OpenAssistant and Open-Chat-Kit models) are supported by nolanoorg/cformers.
You can now interface with the models with just 3 lines of code from python.
from interface import AutoInference as AI
ai = AI('OpenAssistant/oasst-sft-1-pythia-12b')
x = ai.generate("<|prompter|>What's the Earth total population<|endoftext|><|assistant|>", num_tokens_to_generate=100); print(x['token_str'])
Generation speed is same as this repo (75 ms/token for 12B model on Macbook Pro)