llama-cpp-python issues

Add support for auto setting n_gpu_layers from gguf and available vram size

abetlen

enhancement

Add RPC backend support

abetlen

enhancement

Add support for llava-1-5-phi3-mini

4

llava-phi-3-mini uses the Phi-3-instruct chat template. I think is similar with current llava-1-5, but with Phi3 instruct template instead of llama 2. format: `\nQuestion \n` stop word is for system...

CaptainOfHacks

GPU memory released for llava multimodal

2

When I start the llava13b model using the llama-cpp-python server, I notice that the GPU memory usage increases a little after each inference, which suggests that the GPU memory is...

adogwangwang

bug

"split_mode=2 (row)" not working -- got Aborted

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [Yes] I am running the latest code. Development is very rapid so there are no tagged...

oliverhh32

Severe Main Thread Bottleneck

3

`llama-cpp-python` observes a severe bottleneck on the main python thread not otherwise present in `llama.cpp` Running a server with `llama.cpp` directly using ```sh ./server -ngl 999 -m models/Meta-Llama-3-8B-Instruct.Q8_0.gguf --port 12345...

Beinsezii

bug

performance

Two simultaneous request make it crashes

9

When running two simultaneous requests, it crashes with a core dump. ``` GGML_ASSERT: /tmp/pip-install-uaaiunx2/llama-cpp-python_d6e61d67fc93418ab936c848aabd7f64/vendor/llama.cpp/ggml.c:4997: ggml_nelements(a) == ne0*ne1*ne2 ``` I'm running version 0.2.63 on Docker with a Nvidia Tesla P40. Here...

goldyfruit

bug

arm64 builds for CUDA

2

Many are using this library on Nvidia Jetson/Orin devices, but there are no prebuilt wheels available for CUDA arm architectures. Could support for automated builds of these wheels be added?...

mcvella

enhancement

verbose doesn't work + writing non-error messages to stderr

1

It's very frustrating that a lot of messages get written to stderr, like model parameters that are very difficult to differentiate from errors. I tried to capture stderr but then...

barnakissdr

Does this lib support contrastive search decoding ?

2

Hi @abetlen, I checked the parameters in both ```__call__``` and ```create_completion``` method but did not see ```penalty_alpha``` param which represents **contrastive search** decoding. Can you update the decoding strategy soon...

congson1293

enhancement

llama-cpp-python
llama-cpp-python copied to clipboard

Metadata

Add support for auto setting n_gpu_layers from gguf and available vram size

Add RPC backend support

Add support for llava-1-5-phi3-mini

GPU memory released for llava multimodal

"split_mode=2 (row)" not working -- got Aborted

Severe Main Thread Bottleneck

Two simultaneous request make it crashes

arm64 builds for CUDA

verbose doesn't work + writing non-error messages to stderr

Does this lib support contrastive search decoding ?

← Metadata

Owner

Metadata

llama-cpp-python llama-cpp-python copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama-cpp-python
llama-cpp-python copied to clipboard