langgraph icon indicating copy to clipboard operation
langgraph copied to clipboard

Local RAG agent with LLaMA3 error: Ollama call failed with status code 400. Details: {"error":"unexpected server status: 1"}

Open luca-git opened this issue 9 months ago • 7 comments

Checked other resources

  • [X] I added a very descriptive title to this issue.
  • [X] I searched the LangChain documentation with the integrated search.
  • [X] I used the GitHub search to find a similar question and didn't find it.
  • [X] I am sure that this is a bug in LangChain rather than my code.

Example Code

notebook example code in https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb

Error Message and Stack Trace (if applicable)

{'score': 'yes'}
According to the context, agent memory refers to a long-term memory module (external database) that records a comprehensive list of agents' experience in natural language. This memory stream is used by generative agents to enable them to behave conditioned on past experience and interact with other agents.
Traceback (most recent call last):
  File "/home/luca/pymaindir_icos/autocoders/lc_coder/lama3/local_llama3.py", line 139, in <module>
    answer_grader.invoke({"question": question,"generation": generation})
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2499, in invoke
    input = step.invoke(
            ^^^^^^^^^^^^
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 158, in invoke
    self.generate_prompt(
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 560, in generate_prompt
    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 421, in generate
    raise e
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 411, in generate
    self._generate_with_cache(
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 632, in _generate_with_cache
    result = self._generate(
             ^^^^^^^^^^^^^^^
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_community/chat_models/ollama.py", line 259, in _generate
    final_chunk = self._chat_stream_with_aggregation(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_community/chat_models/ollama.py", line 190, in _chat_stream_with_aggregation
    for stream_resp in self._create_chat_stream(messages, stop, **kwargs):
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_community/chat_models/ollama.py", line 162, in _create_chat_stream
    yield from self._create_stream(
               ^^^^^^^^^^^^^^^^^^^^
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_community/llms/ollama.py", line 251, in _create_stream
    raise ValueError(
ValueError: Ollama call failed with status code 400. Details: {"error":"unexpected server status: 1"}

Description

running the example code i get the above error, this is not happening with mistral so i guess my ollama is ok. I also get the first "yes" from the llama3 if I'm not mistaken, so I supsect it's related to something not working here:

from pprint import pprint
inputs = {"question": "What are the types of agent memory?"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"Finished running: {key}:")
pprint(value["generation"])

System Info

Ubuntu 22.04.4 LTS Anaconda and VSC.

luca-git avatar Apr 25 '24 12:04 luca-git

I have been experiencing the same issue. It seems to happen at random when Ollama is requested. This thread suggests configuring a retry method: https://github.com/langchain-ai/langchain/issues/20773#issuecomment-2072117003 Seems to work, but would be nice to get an official fix.

Gyarados avatar Apr 28 '24 02:04 Gyarados

I'm seeing the same issue, the retry referenced here seems to help at times, but not 100%: https://github.com/langchain-ai/langchain/issues/20773#issuecomment-2072117003

matrodge avatar Apr 29 '24 17:04 matrodge

I can second this exactly:

I'm seeing the same issue, the retry referenced here seems to help at times, but not 100%: langchain-ai/langchain#20773 (comment)

MichlF avatar May 01 '24 13:05 MichlF

Workaround works (sometimes) after updating Ollama to 0.1.9

luca-git avatar May 01 '24 19:05 luca-git

Still an issue, not really functional on Ollama 0.1.32.

EDIT: resolved. Solved for me by ensuring other Ollama instances on the system (other Ubuntu instances under WSL or on host Windows machine were off (or uninstalled for the Windows version). Ollama may have a bug related to stopping the server.

wdonno avatar May 06 '24 00:05 wdonno

This seems to be more of an Ollama issue in this case? Or is there something specific to this notebook that you want fixed

hinthornw avatar May 07 '24 05:05 hinthornw

It's a terrific notebook and I'd love to see it working with ollama and llama 3. I believe the issue affects every llama 3 implementation so fixing it would help greatly.

luca-git avatar May 07 '24 11:05 luca-git