llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Python bindings for llama.cpp

Results 424 llama-cpp-python issues
Sort by recently updated
recently updated
newest added

Stop sequence implementation is currently a little complicated due to needing to support streaming. Also behavior is ill-defined.

enhancement

Fixes https://github.com/abetlen/llama-cpp-python/issues/79 I forgot to tokenize the end of text message. Also fixes "n_predict" which didnt handle -1 before, but does now.

It gets gateway timeout very often and is there any error handling done. Increase the timeout maybe. Also threads are /2. https://github.com/abetlen/llama-cpp-python/blob/main/examples/high_level_api/fastapi_server.py#L31

After successfully installing llama cpp Python using Python 3.10 Execute import llama_ cpp Tips: Traceback (most recent call last): File "/root/anaconda3/lib/python3.10/site-packages/langchain/llms/llamacpp.py", line 106, in validate_environment from llama_cpp import Llama ImportError:...

The current implementation of caching is wonderful, its been a great help speeding up conversations. I do notice this trips up when a secondary user starts a conversation, would it...

I was trying to run an alpaca model on a framework with a relatively large context window, and the following message keeps popping up: ```llama_tokenize: too many tokens``` how could...

I've pushed support for the new LoRA feature of llama.cpp. This feature should allow you to load a base model and apply a LoRA adapter on-the-fly. I haven't yet published...

help wanted

The current default value is cpu_count/2: https://github.com/abetlen/llama-cpp-python/blob/b2a24bddacc7b10d1ba8a0dff1d8b5fae9bfbad3/llama_cpp/llama.py#L102 This value does not seem to be optimal for multicore systems. For example, a CPU with 8 cores will have 4 cores idle....

I try to load a model and get this error: ```python llama.cpp: loading model from ggml-model.bin Traceback (most recent call last): File "D:\Projects\llama-cpp-python-test\main.py", line 2, in llm = Llama(model_path="ggml-model.bin") File...