llama-cpp-python Low level API example failed to run

I attempted to run a low-level API in version 0.2.11, but both installing from pypi and compiling from source failed. python: 3.10.12 llama_cpp_python: 0.2.11

{llama-cpp-python/examples/low_level_api}$ python low_level_api_llama_cpp.py
Traceback (most recent call last):
  File ".../llama-cpp-python/examples/low_level_api/low_level_api_llama_cpp.py", line 15, in <module>
    model = llama_cpp.llama_load_model_from_file(MODEL_PATH.encode('utf-8'), lparams)
  File ".../llama-cpp-python/llama_cpp/llama_cpp.py", line 498, in llama_load_model_from_file
    return _lib.llama_load_model_from_file(path_model, params)
ctypes.ArgumentError: argument 2: TypeError: expected llama_model_params instance instead of llama_context_params

Oct 24 '23 06:10 islwx

@islwx Hello, if you find out the solution please write it here, i am struggling aswell. I think this is also connected to the server problem 500. Thanks.

Oct 26 '23 08:10 saboTec

@islwx Hello, if you find out the solution please write it here, i am struggling aswell. I think this is also connected to the server problem 500. Thanks.

I tried using an old version without any issues. This is obviously an issue where low-level API examples in the current version cannot keep up with version updates. Developers need to update the development examples and related documents.

Oct 27 '23 02:10 islwx

same problem

Nov 01 '23 13:11 myrainbowandsky

Just updated from pip and getting the same issue. The fix is to use _llama_cpp.load_model_default_params()

self.lparams = llama_cpp.llama_context_default_params()
self.mparams = llama_cpp.llama_model_default_params()
self.model = llama_cpp.llama_load_model_from_file(model_path.encode('utf-8'), self.mparams)
self.ctx = llama_cpp.llama_new_context_with_model(self.model, self.lparams`

Nov 01 '23 14:11 smoskal

I modified the sample code in README.

llama_cpp_python: 0.2.90

import llama_cpp
import ctypes
llama_cpp.llama_backend_init(False) # Must be called once at the start of each program
lparams = llama_cpp.llama_context_default_params()
mparams = llama_cpp.llama_model_default_params()
# use bytes for char * params
model = llama_cpp.llama_load_model_from_file(b"./models/7b/llama-model.gguf", mparams)
ctx = llama_cpp.llama_new_context_with_model(model, lparams)
max_tokens = lparams.n_ctx
# use ctypes arrays for array params
tokens = (llama_cpp.llama_token * int(max_tokens))()
prompt = "Q: Name the planets in the solar system? A: "
pbytes = bytes(prompt, "utf-8")
n_tokens = llama_cpp.llama_tokenize(model, pbytes, len(pbytes), tokens, len(tokens), False, False)
llama_cpp.llama_free(ctx)
print(tokens[:n_tokens])

Equivalent:

from llama_cpp import Llama
llm = Llama("./models/7b/llama-model.gguf")
prompt = "Q: Name the planets in the solar system? A: "
tokens = llm.tokenize(bytes(prompt, "utf-8"), False)
print(tokens)

Sep 10 '24 10:09 7shi