llama-cpp-python issues

Add additional tests for `stop` sequences

5

Stop sequence implementation is currently a little complicated due to needing to support streaming. Also behavior is ill-defined.

abetlen

enhancement

Add LoRA support for low level API

abetlen

enhancement

Fix TypeError in low_level chat

Fixes https://github.com/abetlen/llama-cpp-python/issues/79 I forgot to tokenize the end of text message. Also fixes "n_predict" which didnt handle -1 before, but does now.

SagsMug

Performance Improvement and Error Handling for API

It gets gateway timeout very often and is there any error handling done. Increase the timeout maybe. Also threads are /2. https://github.com/abetlen/llama-cpp-python/blob/main/examples/high_level_api/fastapi_server.py#L31

djaffer

import llama_cpp ERROR

4

After successfully installing llama cpp Python using Python 3.10 Execute import llama_ cpp Tips: Traceback (most recent call last): File "/root/anaconda3/lib/python3.10/site-packages/langchain/llms/llamacpp.py", line 106, in validate_environment from llama_cpp import Llama ImportError:...

823863429

Cache Feature Request

4

The current implementation of caching is wonderful, its been a great help speeding up conversations. I do notice this trips up when a secondary user starts a conversation, would it...

snxraven

llama_tokenize: too many tokens

I was trying to run an alpaca model on a framework with a relatively large context window, and the following message keeps popping up: ```llama_tokenize: too many tokens``` how could...

ylchin

[Request] Test LoRA API

3

I've pushed support for the new LoRA feature of llama.cpp. This feature should allow you to load a base model and apply a LoRA adapter on-the-fly. I haven't yet published...

abetlen

help wanted

Default value for the number of threads

3

The current default value is cpu_count/2: https://github.com/abetlen/llama-cpp-python/blob/b2a24bddacc7b10d1ba8a0dff1d8b5fae9bfbad3/llama_cpp/llama.py#L102 This value does not seem to be optimal for multicore systems. For example, a CPU with 8 cores will have 4 cores idle....

avdosev

CDLL llama_init_from_file function

I try to load a model and get this error: ```python llama.cpp: loading model from ggml-model.bin Traceback (most recent call last): File "D:\Projects\llama-cpp-python-test\main.py", line 2, in llm = Llama(model_path="ggml-model.bin") File...

Holpak

llama-cpp-python
llama-cpp-python copied to clipboard

Metadata

Add additional tests for `stop` sequences

Add LoRA support for low level API

Fix TypeError in low_level chat

Performance Improvement and Error Handling for API

import llama_cpp ERROR

Cache Feature Request

llama_tokenize: too many tokens

[Request] Test LoRA API

Default value for the number of threads

CDLL llama_init_from_file function

← Metadata

Owner

Metadata

llama-cpp-python llama-cpp-python copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama-cpp-python
llama-cpp-python copied to clipboard