llama-cpp-python
llama-cpp-python copied to clipboard
The results generated are different from those produced by executing commands with the llama cpp library
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the README.md.
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
When running DeepSeek-R1 on an arm64 architecture with llama-cpp-python version 0.3.7, I expect the performance and results to be comparable to those achieved using the llama.cpp library.
Current Behavior
When running the DeepSeek-R1 model on an arm64 architecture and calling the create_chat_completion function from the llama-cpp-python library's Llama module, the output differs from the results produced by the ./llama-cli built using the llama.cpp library.
Details:
- The output from
create_chat_completiondoes not include the<thinking>tag. - The responses tend to stop prematurely, not providing complete answers as expected.
Environment and Context
-
Physical (or virtual) hardware you are using:
- CPU: Cortex-A73 (aarch64 architecture)
-
Operating System:
- Linux 20.04
-
SDK version:
- Python: 3.8.10
- Make: 4.2.1
- G++: 9.4.0
-
llama-cpp-python package version:
- 0.3.7
Steps to Reproduce
-
Install llama-cpp-python with specific CMake arguments:
CMAKE_ARGS="-DGGML_NATIVE=OFF -DGGML_CROSS_COMPILE=ON" pip install llama-cpp-python -
Invoke
create_chat_completionto generate a responsedef generate(self, prompt, top_k, temp): output = self.llm.create_chat_completion( prompt, top_k=top_k, temperature=temp, max_tokens=self.max_tokens, stop=["Q:", "\n"], stream=self.stream, ) return output
usage phenomenon
The content following “You:” is my input question,the "Okay ....." is the answer, you can notice that the answer stop prematurely
You: To solve the system of equations x+y=20 and 2x+4y=56. what are the values of x and y?
Okay, so I have this system of equations to solve: x plus y equals 20, and 2x plus 4y equals 56. Hmm, let me see how to approach this. I remember that there are a couple of methods to solve systems like this, like substitution and elimination. Maybe I'll try substitution first since it's often straightforward.
INFO:root:Bot: Okay, so I have this system of equations to solve: x plus y equals 20, and 2x plus 4y equals 56. Hmm, let me see how to approach this. I remember that there are a couple of methods to solve systems like this, like substitution and elimination. Maybe I'll try substitution first since it's often straightforward.
INFO:root:llm generate time: 21.382812 s