llama-cpp-python
llama-cpp-python copied to clipboard
Server with LangChain and Streaming throws JSONDecodeError
Error
When using LangChain with the server (as outlined in examples/notebooks/Clients.ipynb), setting Streaming to true and setting up the handlers as in their documentation, I end up getting the below error.
File "/home/opc/.local/lib/python3.9/site-packages/openai/api_requestor.py", line 674, in _interpret_response_line
data = json.loads(rbody)
File "/usr/lib64/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python3.9/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/api_test.py", line 21, in <module>
response = openai(
File "/home/opc/.local/lib/python3.9/site-packages/langchain/llms/base.py", line 246, in __call__
return self.generate([prompt], stop=stop).generations[0][0].text
File "/home/opc/.local/lib/python3.9/site-packages/langchain/llms/base.py", line 140, in generate
raise e
File "/home/opc/.local/lib/python3.9/site-packages/langchain/llms/base.py", line 137, in generate
output = self._generate(prompts, stop=stop)
File "/home/opc/.local/lib/python3.9/site-packages/langchain/llms/openai.py", line 279, in _generate
for stream_resp in completion_with_retry(
File "/home/opc/.local/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 167, in <genexpr>
return (
File "/home/opc/.local/lib/python3.9/site-packages/openai/api_requestor.py", line 613, in <genexpr>
self._interpret_response_line(
File "/home/opc/.local/lib/python3.9/site-packages/openai/api_requestor.py", line 676, in _interpret_response_line
raise error.APIError(
openai.error.APIError: HTTP code 200 from API (2023-04-16 17:10:05.542825)
The server was started simply with python -m llama_cpp.server, and the code for the request I'm making is:
import os
from langchain.llms import OpenAI
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
os.environ["OPENAI_API_KEY"] = "asd"
os.environ["OPENAI_API_BASE"] = "http://localhost:8000/v1" # this URL was successful, http://100.64.159.73:8000/v1 didn't do anything
openai = OpenAI(
streaming=True,
callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
verbose=True,
temperature=0,
)
response = openai(
"Write a short report on government initiatives on climate change, and what can be done to help."
)
Aside from changing OPENAI_API_BASE and changing the prompt, this is directly taken from the previously linked LangChain docs and Clients.ipynb.
Workaround
After looking through the Traceback, going to self._interpret_response_line( from "/home/opc/.local/lib/python3.9/site-packages/openai/api_requestor.py", and looking at the responses, the error comes from one of the first few responses just containing the date and time rather than being a JSON object.
Generating completion...
Response Body: {"id": "cmpl-9f63af96-cd57-4b64-88a2-978363ec3697", "object": "text_completion", "created": 1681667366, "model": "./Vicuna-13B-ggml-4bit-delta-merged_2023-04-03/vicuna-13b-ggml-q4_0-delta-merged/ggml-model-q4_0.bin", "choices": [{"text": "\n", "index": 0, "logprobs": null, "finish_reason": null}]}
Response Body: {"id": "cmpl-9f63af96-cd57-4b64-88a2-978363ec3697", "object": "text_completion", "created": 1681667366, "model": "./Vicuna-13B-ggml-4bit-delta-merged_2023-04-03/vicuna-13b-ggml-q4_0-delta-merged/ggml-model-q4_0.bin", "choices": [{"text": "\n", "index": 0, "logprobs": null, "finish_reason": null}]}
Response Body: 2023-04-16 17:49:41.938351
Response Body: {"id": "cmpl-9f63af96-cd57-4b64-88a2-978363ec3697", "object": "text_completion", "created": 1681667366, "model": "./Vicuna-13B-ggml-4bit-delta-merged_2023-04-03/vicuna-13b-ggml-q4_0-delta-merged/ggml-model-q4_0.bin", "choices": [{"text": "\\", "index": 0, "logprobs": null, "finish_reason": null}]}
This quick edit to api_requestor.py solves the problem.
return (
self._interpret_response_line(
line, result.status_code, result.headers, stream=True
)
for line in parse_stream(result.iter_lines())
if line[0] == "{" # INSERT THIS LINE
), True
However, this is an edit to the OpenAI package - is there something llama-cpp-python could do in the server to fix this? It feels a little inconvenient. Curiously, short prompts don't seem to trigger the error - the prompt 'The quick brown fox' finishes just fine, as expected.
I don't know enough about llama-cpp-python to tell if this can be fixed from here, but hopefully this can at least be helpful to anyone with a similar problem.
you can try the same thing with https://github.com/keldenl/gpt-llama.cpp, do you still run into the same problem? you can easily try this out by just running npx gpt-llama.cpp start and updating the api key to the path to your model
Any update?