llama-cpp-python issues

Huge difference in performance between llama.cpp and llama-cpp-python

1

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [ X] I am running the latest code. Development is very rapid so there are no...

kseyhan

bug

performance

Count/truncate number of tokens before processing

7

Create a function that takes in text as input, converts it into tokens, counts the tokens, and then returns the text with a maximum length that is limited by the...

jakvb

enhancement

question

Train Llama-3-8B-Instruct on my own dataset

1

How can I train llama-3-8B-Instruct with my own dataset in csv format? Do you have a code or idea?

masterofaimoein

Support PyPI-installed `nvidia-cuda-runtime-cu12` and `nvidia-cublas-cu12`

**Is your feature request related to a problem? Please describe.** PyTorch is able to install its CUDA dependencies via the above wheels during pip-install. Adding these wheels to the dependencies...

Interpause

Update simple Docker

#1423

yentur

Implement code interpreter feature for functionary

1

- Enables code interpreter/generation feature of functionary models by providing `{"type": "code_interpreter}` in one of the tools. - Adjust prompt template when code_interpreter tool is provided in the request -...

jeffrey-fong

when depoly the llava-cpp-pyton server in k8s as a service , it can only answer questions about the first image

@abetlen Hello, when I use python -m llama_cpp.server deployed a llava13b service on the Kubernetes platform, I noticed an issue where only the first image could be correctly returned. When...

adogwangwang

Add the Phi 3 mini chat format

3

*my first PR :)

reddiamond1234

Is there support for loading a sharded gguf file ?

5

**Is your feature request related to a problem? Please describe.** Inquiring whether this project supports loading a "sharded" gguf model file ? The llama cpp project appears to add tooling...

jharonfe

Loading sharded (GGUF) model files from HF with LLama.from_pretrained() 'additional_files' argument

Added code allows to specify multiple files to load via HuggingFace Hub in LLama.from_pretrained(). New argument takes a List of strings, which are used the same as the 'file_name' string...

Gnurro

llama-cpp-python
llama-cpp-python copied to clipboard

Metadata

Huge difference in performance between llama.cpp and llama-cpp-python

Count/truncate number of tokens before processing

Train Llama-3-8B-Instruct on my own dataset

Support PyPI-installed `nvidia-cuda-runtime-cu12` and `nvidia-cublas-cu12`

Update simple Docker

Implement code interpreter feature for functionary

when depoly the llava-cpp-pyton server in k8s as a service , it can only answer questions about the first image

Add the Phi 3 mini chat format

Is there support for loading a sharded gguf file ?

Loading sharded (GGUF) model files from HF with LLama.from_pretrained() 'additional_files' argument

← Metadata

Owner

Metadata

llama-cpp-python llama-cpp-python copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama-cpp-python
llama-cpp-python copied to clipboard