guidance Unable to load models with Llamacpp

The bug A clear and concise description of what the bug is.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[/teamspace/studios/this_studio/eval.ipynb](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/teamspace/studios/this_studio/eval.ipynb) Cell 3 line 4
      [1](vscode-notebook-cell://vscode-01hraxt48damrgfwy9ven0ye4s.studio.lightning.ai/teamspace/studios/this_studio/eval.ipynb#W2sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) from guidance import models, gen, select
      [3](vscode-notebook-cell://vscode-01hraxt48damrgfwy9ven0ye4s.studio.lightning.ai/teamspace/studios/this_studio/eval.ipynb#W2sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2) path = "mistral-7b-instruct-v0.2.Q8_0.gguf"
----> [4](vscode-notebook-cell://vscode-01hraxt48damrgfwy9ven0ye4s.studio.lightning.ai/teamspace/studios/this_studio/eval.ipynb#W2sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3) llm = models.LlamaCpp(model=path, n_ctx=4096, n_gpu_layers=-1)
      [6](vscode-notebook-cell://vscode-01hraxt48damrgfwy9ven0ye4s.studio.lightning.ai/teamspace/studios/this_studio/eval.ipynb#W2sdnNjb2RlLXJlbW90ZQ%3D%3D?line=5) from llama_cpp import Llama
      [7](vscode-notebook-cell://vscode-01hraxt48damrgfwy9ven0ye4s.studio.lightning.ai/teamspace/studios/this_studio/eval.ipynb#W2sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6) llm = Llama(
      [8](vscode-notebook-cell://vscode-01hraxt48damrgfwy9ven0ye4s.studio.lightning.ai/teamspace/studios/this_studio/eval.ipynb#W2sdnNjb2RlLXJlbW90ZQ%3D%3D?line=7)       model_path=path,
      [9](vscode-notebook-cell://vscode-01hraxt48damrgfwy9ven0ye4s.studio.lightning.ai/teamspace/studios/this_studio/eval.ipynb#W2sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8)       n_gpu_layers=-1,
     [10](vscode-notebook-cell://vscode-01hraxt48damrgfwy9ven0ye4s.studio.lightning.ai/teamspace/studios/this_studio/eval.ipynb#W2sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9) )

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:74](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:74), in LlamaCpp.__init__(self, model, tokenizer, echo, compute_log_probs, caching, temperature, **kwargs)
     [71](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:71) else:
     [72](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:72)     raise TypeError("model must be None, a file path string, or a llama_cpp.Llama object.")
---> [74](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:74) self._context = _LlamaBatchContext(self.model_obj.n_batch, self.model_obj.n_ctx())
     [76](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:76) if tokenizer is None:
     [77](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:77)     tokenizer = llama_cpp.LlamaTokenizer(self.model_obj)

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:23](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:23), in _LlamaBatchContext.__init__(self, n_batch, n_ctx)
     [21](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:21) def __init__(self, n_batch, n_ctx):
     [22](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:22)     self._llama_batch_free = llama_cpp.llama_batch_free
---> [23](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:23)     self.batch = llama_cpp.llama_batch_init(n_tokens=n_batch, embd=0, n_seq_max=n_ctx)
     [24](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:24)     if self.batch is None:
     [25](https://vscode-remote+vscode-002d01hraxt48damrgfwy9ven0ye4s-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/guidance/models/llama_cpp/_llama_cpp.py:25)         raise Exception("call to llama_cpp.llama_batch_init returned NULL.")

TypeError: this function takes at least 3 arguments (0 given)

To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

from guidance import models, gen, select

path = "mistral-7b-instruct-v0.2.Q8_0.gguf"
llm = models.LlamaCpp(model=path, n_ctx=4096, n_gpu_layers=-1)

System info (please complete the following information):

OS (e.g. Ubuntu, Windows 11, Mac OS, etc.):
Guidance Version (guidance.__version__):

Mar 10 '24 13:03 aniketmaurya

I was facing the same error, install llama-cpp-python==0.2.26 and it should work!

Mar 11 '24 12:03 Warlord-K

Using an older llama-cpp version works but limits the usage of some newer models. For example, you can't load stabilityai/stablelm-2-zephyr-1_6b on 0.2.26. We need the team to bump compatibility, ideally to the latest version :)

Mar 11 '24 19:03 alexandreteles

I think this issue is resolved with PR https://github.com/guidance-ai/guidance/pull/665, but there hasn't been a release since then.

Mar 11 '24 22:03 paulbkoch

but there hasn't been a release since then.

Should we expect a release any time soon or should I simply cherry pick the change into a local fork?

EDIT: @paulbkoch maybe consider offering a nightly version on pypi that reflects the latest state of development without requiring us to pip install from github?

Mar 12 '24 17:03 alexandreteles

Any hope for a release to pypi soon with this fix?

https://github.com/guidance-ai/guidance/discussions/692

Mar 14 '24 14:03 michael-conrad

but there hasn't been a release since then.

Should we expect a release any time soon or should I simply cherry pick the change into a local fork?

EDIT: @paulbkoch maybe consider offering a nightly version on pypi that reflects the latest state of development without requiring us to pip install from github?

Is the pull request merged for installing directly from github? Or is there a way to install directly from github and pull in the pull request at the same time?

Mar 18 '24 19:03 michael-conrad

guidance guidance copied to clipboard

Unable to load models with Llamacpp

guidance
guidance copied to clipboard