llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Python bindings for llama.cpp

Results 424 llama-cpp-python issues
Sort by recently updated
recently updated
newest added

Add CPU with AVX, AVX2, AVX512 with OpenBlas & Remove unnecesary 32 bits wheels - Without AVX Utuntu, Windows => 32 bits, mac => 64 bits - AVX : Ubuntu,...

Replace uvicorn by hypercorn to support ipv6

# Expected Behavior The server should cache both the previous prompt and the last generation. # Current Behavior The cache misses at the end of the previous prompt, forcing to...

With the update to v1 OpenAI's API changed significantly, while backwards compatibility was straightforward to preserve on the server the python API is lagging. The main difference in the pre...

enhancement

Fixes #599. Thanks for all your work on this project!

Having multiple BOS can ruin generation, this would occur in several ways, usually through user adding them unnecessarily, in this case remove first token if we detect two in a...

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [x] I am running the latest code. Development is very rapid so there are no tagged...

bug

# Expected Behavior From the issue #302 , I expected the model to be unloaded with the following function: ``` def unload_model(): global llm llama_free_model(llm) # Delete the model object...

bug

I upgraded from an older version, and experienced a disturbingly long read-ahead time. The load on my machine is about the same (a bit higher with python, but that's understandable)...

Next step towards #1336 adds a new parameter to be able to pass arbitrary arguments to the template, much like transformers, except through an explicit parameter instead of just plain...