langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Bug of csv agent, basically all query failed with json error

Open yockgen opened this issue 1 year ago • 5 comments

System Info

Python version: Python 3.10.6 Langchain version: 0.0.219 OS: Ubuntu 22.04

Who can help?

@eyurtsev

Information

  • [X] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [ ] LLMs/Chat Models
  • [ ] Embedding Models
  • [ ] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [ ] Document Loaders
  • [ ] Vector Stores / Retrievers
  • [ ] Memory
  • [ ] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [ ] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

I've simple code like this

from langchain.agents import create_csv_agent
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType
import os
import sys

directory = './test'

f = []
for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        f.append(directory + "/" +filename)

agent = create_csv_agent(
    ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613"),
    f,
    verbose=True,
    agent_type=AgentType.OPENAI_FUNCTIONS,
)

qry ="how many rows are there?"

while True:
  if not qry:
    qry = input("Q: ")
  if qry in ['quit', 'q', 'exit']:
    sys.exit()

  agent.run(qry)
  qry = None

I'm using titanic dataset

https://github.com/datasciencedojo/datasets/blob/master/titanic.csv

Error as below

$ python3 langchain-csv.py


> Entering new  chain...
Traceback (most recent call last):
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 112, in _parse_ai_message
    _tool_input = json.loads(function_call["arguments"])
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/mytuition/langchain-csv.py", line 32, in <module>
    agent.run(qry)
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 290, in run
    return self(args[0], callbacks=callbacks, tags=tags)[_output_key]
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 166, in __call__
    raise e
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 160, in __call__
    self._call(inputs, run_manager=run_manager)
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 987, in _call
    next_step_output = self._take_next_step(
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 803, in _take_next_step
    raise e
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 792, in _take_next_step
    output = self.agent.plan(
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 212, in plan
    agent_decision = _parse_ai_message(predicted_message)
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 114, in _parse_ai_message
    raise OutputParserException(
langchain.schema.OutputParserException: Could not parse tool input: {'name': 'python', 'arguments': 'len(df)'} because the `arguments` is not valid JSON.

Expected behavior

Langchain provided answer of total row

yockgen avatar Jul 01 '23 01:07 yockgen

Hi @yockgen, try passing names of CSV files directly instead of a list of CSV filenames.

agent = create_csv_agent( ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613"), 'titanic.csv', verbose=True, agent_type=AgentType.OPENAI_FUNCTIONS, )

dzmitryashkinadze avatar Jul 01 '23 13:07 dzmitryashkinadze

I'm also running into the same problem, passing files directly does not help.

kboa avatar Jul 04 '23 01:07 kboa

@dzmitryashkinadze I've changed my code to pass in file name directly, it could run with my previous query, however, the same errors if I asked some other question, see below:

Code:

from langchain.agents import create_csv_agent
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType
import os
import sys

agent = create_csv_agent(
    ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613"),
    "./data/csv/titanic.csv",
    verbose=True,
    agent_type=AgentType.OPENAI_FUNCTIONS,
)


#qry = "give some insights on the dataset?" <-- this query is running fine
#qry ="how many rows are there?"  <-- this query is running fine
qry = "listing all passenger's name"   <-- this query is throwing error

while True:
  if not qry:
    qry = input("Q: ")
  if qry in ['quit', 'q', 'exit']:
    sys.exit()

  agent.run(qry)
  qry = None

Error:

# python3 langchain-csv.py
> Entering new  chain...
Traceback (most recent call last):
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 112, in _parse_ai_message
    _tool_input = json.loads(function_call["arguments"])
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/mytuition/langchain-csv.py", line 35, in <module>
    agent.run(qry)
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 290, in run
    return self(args[0], callbacks=callbacks, tags=tags)[_output_key]
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 166, in __call__
    raise e
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 160, in __call__
    self._call(inputs, run_manager=run_manager)
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 987, in _call
    next_step_output = self._take_next_step(
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 803, in _take_next_step
    raise e
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 792, in _take_next_step
    output = self.agent.plan(
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 212, in plan
    agent_decision = _parse_ai_message(predicted_message)
  File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 114, in _parse_ai_message
    raise OutputParserException(
langchain.schema.OutputParserException: Could not parse tool input: {'name': 'python', 'arguments': "passenger_names = df['Name']\npassenger_names"} because the `arguments` is not valid JSON.

yockgen avatar Jul 05 '23 04:07 yockgen

Did anybody find a solution to this?

When I use OpenAI as the LLM for this agent, there's no problem. But if I use, ChatOpenAI, I get this error.

PradHolla avatar Jul 07 '23 07:07 PradHolla

@PradHolla I have similar problems. If I use OpenAI which uses davinci model, there is no error. But if I use the chat model, I get all kinds of errors. The errors with 'agent_type=AgentType.OPENAI_FUNCTIONS' and with 'agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION' are also quite different. The reason I want to switch away from davinci to the chat model is that the current davinci models are considered as 'legacy' by OpenAI now without a good replacement. And it is also more expensive than turbo: Davinci | $0.0300 / 1K tokens | $0.1200 / 1K tokens GPT-3.5 Turbo 4K context | $0.0015 / 1K tokens | $0.002 / 1K tokens

eliujl avatar Jul 08 '23 18:07 eliujl

I've experienced similar problems. To fix this you will need to add 'args_schema' to PythonAstREPLTool. I have basically created a copy of 'langchain/tools/python/tool.py' and added a arg schema with a description, so OpenAIFunction Agents as well as ZeroShot Agents pass the right tool input.

Code:

class AstArgSchema(BaseModel):
    """A schema for the ast argument."""
    query: str = Field(description="A string formatted plain python script with imports and variables to execute.")

class PythonAstREPLTool(BaseTool):
    """A tool for running python code in a REPL."""

    name = "python_repl_ast"
    description = (
        "A Python shell. Use this to execute python commands. "
        "Input should be a valid python command. "
        "When using this tool, sometimes output is abbreviated - "
        "make sure it does not look abbreviated before using it in your answer."
    )
    globals: Optional[Dict] = Field(default_factory=dict)
    locals: Optional[Dict] = Field(default_factory=dict)
    sanitize_input: bool = True
    args_schema: Type[BaseModel] = AstArgSchema

    @root_validator(pre=True)
    def validate_python_version(cls, values: Dict) -> Dict:
        """Validate valid python version."""
        if sys.version_info < (3, 9):
            raise ValueError(
                "This tool relies on Python 3.9 or higher "
                "(as it uses new functionality in the `ast` module, "
                f"you have Python version: {sys.version}"
            )
        return values

    def _run(
        self,
        query: str,
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> str:
        """Use the tool."""
        try:
            if self.sanitize_input:
                query = sanitize_input(query)
            tree = ast.parse(query)
            module = ast.Module(tree.body[:-1], type_ignores=[])
            exec(ast.unparse(module), self.globals, self.locals)  # type: ignore
            module_end = ast.Module(tree.body[-1:], type_ignores=[])
            module_end_str = ast.unparse(module_end)  # type: ignore
            io_buffer = StringIO()
            try:
                with redirect_stdout(io_buffer):
                    ret = eval(module_end_str, self.globals, self.locals)
                    if ret is None:
                        return io_buffer.getvalue()
                    else:
                        return ret
            except Exception:
                with redirect_stdout(io_buffer):
                    exec(module_end_str, self.globals, self.locals)
                return io_buffer.getvalue()
        except Exception as e:
            return "{}: {}".format(type(e).__name__, str(e))

    async def _arun(
        self,
        query: str,
        run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
    ) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("PythonReplTool does not support async")

To use it with create_pandas_dataframe_agent() you will also have to create a customized version 'langchain/agents/agent_toolkits/pandas/base.py' where you change the import of PythonAstREPLTool to the fixed version.

I hope it will be fixed soon.

elokus avatar Jul 12 '23 15:07 elokus

@elokus I made the suggested changes in my fork but am still running into the same error. were there any additional changes that needed to be made? thanks!

vai0 avatar Jul 20 '23 16:07 vai0

Found the issue -- the arguments contains special characters (newlines \n), and they are not being escaped before it is passed into json.loads in langchain/agents/openai_functions_agent/base.py, hence the reason why it works sometimes (e.g., when the LLM generates an action that doesn't contain newlines)

to fix this, you need to json.dumps(functions["arguments"]) before loading it in json.loads in langchain/agents/openai_functions_agent/base.py

vai0 avatar Jul 20 '23 19:07 vai0

@elokus I made the suggested changes in my fork but am still running into the same error. were there any additional changes that needed to be made? thanks!

It works! Thank you.

315930399 avatar Jul 31 '23 02:07 315930399

Hi, @yockgen! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding of the current state of the issue, there was a bug in the csv agent that caused all queries to fail with a JSON error. Some users suggested passing the names of CSV files directly instead of a list of filenames, which seemed to work for some queries but not others. There were also mentions of similar problems with different agent types. However, it appears that the issue has been resolved by adding an args schema to the PythonAstREPLTool and escaping special characters in the arguments.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

dosubot[bot] avatar Oct 30 '23 16:10 dosubot[bot]

Which version of langchain did you use?

Siddharth-1698 avatar Nov 06 '23 12:11 Siddharth-1698

@baskaryan Could you please help @Siddharth-1698 with this issue? They are asking which version of LangChain they should use. Thank you!

dosubot[bot] avatar Nov 06 '23 12:11 dosubot[bot]

Hi, @yockgen,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue you reported involved a bug in the csv agent of Langchain, causing all queries to fail with a JSON error. It seems that the problem was resolved by escaping special characters in the arguments. The LangChain team is considering closing the issue, pending confirmation from you.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation.

dosubot[bot] avatar Feb 08 '24 16:02 dosubot[bot]