langchain
langchain copied to clipboard
Bug of csv agent, basically all query failed with json error
System Info
Python version: Python 3.10.6 Langchain version: 0.0.219 OS: Ubuntu 22.04
Who can help?
@eyurtsev
Information
- [X] The official example notebooks/scripts
- [ ] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
I've simple code like this
from langchain.agents import create_csv_agent
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType
import os
import sys
directory = './test'
f = []
for filename in os.listdir(directory):
if filename.endswith(".csv"):
f.append(directory + "/" +filename)
agent = create_csv_agent(
ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613"),
f,
verbose=True,
agent_type=AgentType.OPENAI_FUNCTIONS,
)
qry ="how many rows are there?"
while True:
if not qry:
qry = input("Q: ")
if qry in ['quit', 'q', 'exit']:
sys.exit()
agent.run(qry)
qry = None
I'm using titanic dataset
https://github.com/datasciencedojo/datasets/blob/master/titanic.csv
Error as below
$ python3 langchain-csv.py
> Entering new chain...
Traceback (most recent call last):
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 112, in _parse_ai_message
_tool_input = json.loads(function_call["arguments"])
File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/mytuition/langchain-csv.py", line 32, in <module>
agent.run(qry)
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 290, in run
return self(args[0], callbacks=callbacks, tags=tags)[_output_key]
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 166, in __call__
raise e
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 160, in __call__
self._call(inputs, run_manager=run_manager)
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 987, in _call
next_step_output = self._take_next_step(
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 803, in _take_next_step
raise e
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 792, in _take_next_step
output = self.agent.plan(
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 212, in plan
agent_decision = _parse_ai_message(predicted_message)
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 114, in _parse_ai_message
raise OutputParserException(
langchain.schema.OutputParserException: Could not parse tool input: {'name': 'python', 'arguments': 'len(df)'} because the `arguments` is not valid JSON.
Expected behavior
Langchain provided answer of total row
Hi @yockgen, try passing names of CSV files directly instead of a list of CSV filenames.
agent = create_csv_agent( ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613"), 'titanic.csv', verbose=True, agent_type=AgentType.OPENAI_FUNCTIONS, )
I'm also running into the same problem, passing files directly does not help.
@dzmitryashkinadze I've changed my code to pass in file name directly, it could run with my previous query, however, the same errors if I asked some other question, see below:
Code:
from langchain.agents import create_csv_agent
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType
import os
import sys
agent = create_csv_agent(
ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613"),
"./data/csv/titanic.csv",
verbose=True,
agent_type=AgentType.OPENAI_FUNCTIONS,
)
#qry = "give some insights on the dataset?" <-- this query is running fine
#qry ="how many rows are there?" <-- this query is running fine
qry = "listing all passenger's name" <-- this query is throwing error
while True:
if not qry:
qry = input("Q: ")
if qry in ['quit', 'q', 'exit']:
sys.exit()
agent.run(qry)
qry = None
Error:
# python3 langchain-csv.py
> Entering new chain...
Traceback (most recent call last):
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 112, in _parse_ai_message
_tool_input = json.loads(function_call["arguments"])
File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/mytuition/langchain-csv.py", line 35, in <module>
agent.run(qry)
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 290, in run
return self(args[0], callbacks=callbacks, tags=tags)[_output_key]
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 166, in __call__
raise e
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 160, in __call__
self._call(inputs, run_manager=run_manager)
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 987, in _call
next_step_output = self._take_next_step(
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 803, in _take_next_step
raise e
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/agent.py", line 792, in _take_next_step
output = self.agent.plan(
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 212, in plan
agent_decision = _parse_ai_message(predicted_message)
File "/home/yockgenm/.local/lib/python3.10/site-packages/langchain/agents/openai_functions_agent/base.py", line 114, in _parse_ai_message
raise OutputParserException(
langchain.schema.OutputParserException: Could not parse tool input: {'name': 'python', 'arguments': "passenger_names = df['Name']\npassenger_names"} because the `arguments` is not valid JSON.
Did anybody find a solution to this?
When I use OpenAI as the LLM for this agent, there's no problem. But if I use, ChatOpenAI, I get this error.
@PradHolla I have similar problems. If I use OpenAI which uses davinci model, there is no error. But if I use the chat model, I get all kinds of errors. The errors with 'agent_type=AgentType.OPENAI_FUNCTIONS' and with 'agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION' are also quite different. The reason I want to switch away from davinci to the chat model is that the current davinci models are considered as 'legacy' by OpenAI now without a good replacement. And it is also more expensive than turbo: Davinci | $0.0300Â / 1K tokens | $0.1200Â / 1K tokens GPT-3.5 Turbo 4K context | $0.0015Â / 1K tokens | $0.002Â / 1K tokens
I've experienced similar problems. To fix this you will need to add 'args_schema' to PythonAstREPLTool. I have basically created a copy of 'langchain/tools/python/tool.py' and added a arg schema with a description, so OpenAIFunction Agents as well as ZeroShot Agents pass the right tool input.
Code:
class AstArgSchema(BaseModel):
"""A schema for the ast argument."""
query: str = Field(description="A string formatted plain python script with imports and variables to execute.")
class PythonAstREPLTool(BaseTool):
"""A tool for running python code in a REPL."""
name = "python_repl_ast"
description = (
"A Python shell. Use this to execute python commands. "
"Input should be a valid python command. "
"When using this tool, sometimes output is abbreviated - "
"make sure it does not look abbreviated before using it in your answer."
)
globals: Optional[Dict] = Field(default_factory=dict)
locals: Optional[Dict] = Field(default_factory=dict)
sanitize_input: bool = True
args_schema: Type[BaseModel] = AstArgSchema
@root_validator(pre=True)
def validate_python_version(cls, values: Dict) -> Dict:
"""Validate valid python version."""
if sys.version_info < (3, 9):
raise ValueError(
"This tool relies on Python 3.9 or higher "
"(as it uses new functionality in the `ast` module, "
f"you have Python version: {sys.version}"
)
return values
def _run(
self,
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the tool."""
try:
if self.sanitize_input:
query = sanitize_input(query)
tree = ast.parse(query)
module = ast.Module(tree.body[:-1], type_ignores=[])
exec(ast.unparse(module), self.globals, self.locals) # type: ignore
module_end = ast.Module(tree.body[-1:], type_ignores=[])
module_end_str = ast.unparse(module_end) # type: ignore
io_buffer = StringIO()
try:
with redirect_stdout(io_buffer):
ret = eval(module_end_str, self.globals, self.locals)
if ret is None:
return io_buffer.getvalue()
else:
return ret
except Exception:
with redirect_stdout(io_buffer):
exec(module_end_str, self.globals, self.locals)
return io_buffer.getvalue()
except Exception as e:
return "{}: {}".format(type(e).__name__, str(e))
async def _arun(
self,
query: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Use the tool asynchronously."""
raise NotImplementedError("PythonReplTool does not support async")
To use it with create_pandas_dataframe_agent() you will also have to create a customized version 'langchain/agents/agent_toolkits/pandas/base.py' where you change the import of PythonAstREPLTool to the fixed version.
I hope it will be fixed soon.
@elokus I made the suggested changes in my fork but am still running into the same error. were there any additional changes that needed to be made? thanks!
Found the issue -- the arguments
contains special characters (newlines \n
), and they are not being escaped before it is passed into json.loads
in langchain/agents/openai_functions_agent/base.py
, hence the reason why it works sometimes (e.g., when the LLM generates an action that doesn't contain newlines)
to fix this, you need to json.dumps(functions["arguments"])
before loading it in json.loads
in langchain/agents/openai_functions_agent/base.py
@elokus I made the suggested changes in my fork but am still running into the same error. were there any additional changes that needed to be made? thanks!
It works! Thank you.
Hi, @yockgen! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding of the current state of the issue, there was a bug in the csv agent that caused all queries to fail with a JSON error. Some users suggested passing the names of CSV files directly instead of a list of filenames, which seemed to work for some queries but not others. There were also mentions of similar problems with different agent types. However, it appears that the issue has been resolved by adding an args schema to the PythonAstREPLTool and escaping special characters in the arguments.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!
Which version of langchain did you use?
@baskaryan Could you please help @Siddharth-1698 with this issue? They are asking which version of LangChain they should use. Thank you!
Hi, @yockgen,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue you reported involved a bug in the csv agent of Langchain, causing all queries to fail with a JSON error. It seems that the problem was resolved by escaping special characters in the arguments. The LangChain team is considering closing the issue, pending confirmation from you.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and cooperation.