llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Server with LangChain and Streaming throws JSONDecodeError

Open ShreyBiswas opened this issue 2 years ago • 1 comments

Error

When using LangChain with the server (as outlined in examples/notebooks/Clients.ipynb), setting Streaming to true and setting up the handlers as in their documentation, I end up getting the below error.

  File "/home/opc/.local/lib/python3.9/site-packages/openai/api_requestor.py", line 674, in _interpret_response_line
    data = json.loads(rbody)
  File "/usr/lib64/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.9/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/api_test.py", line 21, in <module>
    response = openai(
  File "/home/opc/.local/lib/python3.9/site-packages/langchain/llms/base.py", line 246, in __call__
    return self.generate([prompt], stop=stop).generations[0][0].text
  File "/home/opc/.local/lib/python3.9/site-packages/langchain/llms/base.py", line 140, in generate
    raise e
  File "/home/opc/.local/lib/python3.9/site-packages/langchain/llms/base.py", line 137, in generate
    output = self._generate(prompts, stop=stop)
  File "/home/opc/.local/lib/python3.9/site-packages/langchain/llms/openai.py", line 279, in _generate
    for stream_resp in completion_with_retry(
  File "/home/opc/.local/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 167, in <genexpr>
    return (
  File "/home/opc/.local/lib/python3.9/site-packages/openai/api_requestor.py", line 613, in <genexpr>
    self._interpret_response_line(
  File "/home/opc/.local/lib/python3.9/site-packages/openai/api_requestor.py", line 676, in _interpret_response_line
    raise error.APIError(
openai.error.APIError: HTTP code 200 from API (2023-04-16 17:10:05.542825)

The server was started simply with python -m llama_cpp.server, and the code for the request I'm making is:

import os
from langchain.llms import OpenAI
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

os.environ["OPENAI_API_KEY"] = "asd"
os.environ["OPENAI_API_BASE"] = "http://localhost:8000/v1" # this URL was successful, http://100.64.159.73:8000/v1 didn't do anything

openai = OpenAI(
    streaming=True,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True,
    temperature=0,
)

response = openai(
    "Write a short report on government initiatives on climate change, and what can be done to help."
)

Aside from changing OPENAI_API_BASE and changing the prompt, this is directly taken from the previously linked LangChain docs and Clients.ipynb.

Workaround

After looking through the Traceback, going to self._interpret_response_line( from "/home/opc/.local/lib/python3.9/site-packages/openai/api_requestor.py", and looking at the responses, the error comes from one of the first few responses just containing the date and time rather than being a JSON object.

Generating completion...
Response Body: {"id": "cmpl-9f63af96-cd57-4b64-88a2-978363ec3697", "object": "text_completion", "created": 1681667366, "model": "./Vicuna-13B-ggml-4bit-delta-merged_2023-04-03/vicuna-13b-ggml-q4_0-delta-merged/ggml-model-q4_0.bin", "choices": [{"text": "\n", "index": 0, "logprobs": null, "finish_reason": null}]}

Response Body: {"id": "cmpl-9f63af96-cd57-4b64-88a2-978363ec3697", "object": "text_completion", "created": 1681667366, "model": "./Vicuna-13B-ggml-4bit-delta-merged_2023-04-03/vicuna-13b-ggml-q4_0-delta-merged/ggml-model-q4_0.bin", "choices": [{"text": "\n", "index": 0, "logprobs": null, "finish_reason": null}]}

Response Body: 2023-04-16 17:49:41.938351

Response Body: {"id": "cmpl-9f63af96-cd57-4b64-88a2-978363ec3697", "object": "text_completion", "created": 1681667366, "model": "./Vicuna-13B-ggml-4bit-delta-merged_2023-04-03/vicuna-13b-ggml-q4_0-delta-merged/ggml-model-q4_0.bin", "choices": [{"text": "\\", "index": 0, "logprobs": null, "finish_reason": null}]}

This quick edit to api_requestor.py solves the problem.

            return (
                self._interpret_response_line(
                    line, result.status_code, result.headers, stream=True
                )
                for line in parse_stream(result.iter_lines())
                if line[0] == "{"  # INSERT THIS LINE
            ), True

However, this is an edit to the OpenAI package - is there something llama-cpp-python could do in the server to fix this? It feels a little inconvenient. Curiously, short prompts don't seem to trigger the error - the prompt 'The quick brown fox' finishes just fine, as expected.

I don't know enough about llama-cpp-python to tell if this can be fixed from here, but hopefully this can at least be helpful to anyone with a similar problem.

ShreyBiswas avatar Apr 16 '23 17:04 ShreyBiswas

you can try the same thing with https://github.com/keldenl/gpt-llama.cpp, do you still run into the same problem? you can easily try this out by just running npx gpt-llama.cpp start and updating the api key to the path to your model

keldenl avatar Apr 19 '23 00:04 keldenl

Any update?

gjmulder avatar May 21 '23 14:05 gjmulder