pandas-ai
pandas-ai copied to clipboard
SyntaxError: invalid syntax changing the query in the demo example
Sample DataFrame
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [21400000, 2940000, 2830000, 3870000, 2160000, 1350000, 1780000, 1320000, 516000, 14000000],
"happiness_index": [7.3, 7.2, 6.5, 7.0, 6.0, 6.3, 7.3, 7.3, 5.9, 5.0]
})
from pandasai.llm.openai import OpenAI
llm = OpenAI(api_token="")
pandas_ai = PandasAI(llm)
pandas_ai.run(df, prompt='What is the data about?')
The above code (changing the prompt to "What is the data about?"), gives the following error. Looks like it is still able to describe the data, but its giving syntaxerror.
Traceback (most recent call last):
......
File "<unknown>", line 2
The data is about a dataframe with 26 columns and 5 rows. The columns include api_id, email, name, phone number, and various survey questions such as age, gender, and income. There is also a column for whether the respondent was invited by a friend.
^^^^
SyntaxError: invalid syntax
I am running into this issue as well, looks like it's treating the prompt as code hence the syntax error.
pandas_ai.run(df, prompt=f'"{query}"')
resulted in the same error.
Sounds very weird.
Can you try to instantiate Pandas with verbose=True
? @neicras @snacsnoc
pandas_ai = PandasAI(llm, verbose=True)
and attach the log?
@gventuri I tried to re-produce this exception,
...: from examples.data.sample_dataframe import dataframe
...:
...: from pandasai import PandasAI
...: from pandasai.llm.openai import OpenAI
...:
...: df = pd.DataFrame(dataframe)
...:
...: llm = OpenAI()
...: pandas_ai = PandasAI(llm, verbose=True, conversational=False)
...: response = pandas_ai.run(df, "Calculate the sum of the gdp of north american countries")
...: print(response)
Running PandasAI with openai LLM...
Code generated:
df.loc[df['country'].isin(['Canada', 'Mexico', 'United States']), 'gdp'].sum()
Answer: 20901884461056
20901884461056
Now when executing this prompt,
In[2]: response = pandas_ai.run(df, "what is this data about?")
Stack:
Running PandasAI with openai LLM...
Traceback (most recent call last):
File "/Users/pandas-ai/testenv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-4-9e65e54f8bb4>", line 1, in <module>
response = pandas_ai.run(df, "what is this data about?")
File "/Users/pandas-ai/pandasai/__init__.py", line 126, in run
code = self._llm.generate_code(
File "/Users/pandas-ai/pandasai/llm/base.py", line 109, in generate_code
return self._extract_code(self.call(instruction, prompt, suffix="\n\nCode:\n"))
File "/Users/pandas-ai/pandasai/llm/base.py", line 84, in _extract_code
raise NoCodeFoundError("No code found in the response")
pandasai.exceptions.NoCodeFoundError: No code found in the response
- This is thrown on checking if it is a valid python code,
if not self._is_python_code(code):
raise NoCodeFoundError("No code found in the response")
- Surprisingly, it works with this promt ("what is this data?") because of python code snippet
In[3]: response = pandas_ai.run(df, "what is this data?")
Running PandasAI with openai LLM...
Code generated:
import pandas as pd
data = {'country': ['France', 'United Kingdom', 'Italy', 'United States', 'Germany'],
'gdp': [2411255037952, 1340019902, 7109369020, 9913964056, 4288321544],
'happiness_index': [6.38, 7.07, 6.66, 7.07, 6.66]}
df = pd.DataFrame(data)
print(df.head(5))
Answer: country gdp happiness_index
0 United States 19294482071552 6.94
1 United Kingdom 2891615567872 7.16
2 France 2411255037952 6.66
3 Germany 3435817336832 7.07
4 Italy 1745433788416 6.
- If I exclude the valid code check condition, I end up at the syntax error due to the ast code parsing (
tree = ast.parse(code)
)
In[4]: response = pandas_ai.run(df, "what is this data about?")
Running PandasAI with openai LLM...
Code generated:
The data is about the GDP and happiness index of different countries. The dataframe has 10 rows and 3 columns, with columns named 'country', 'gdp', and 'happiness_index'. The first five rows of the dataframe are printed using the command 'print(df.head(5))'.
Traceback (most recent call last):
File "/Users/sanchit/pandas-ai/testenv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-9e65e54f8bb4>", line 1, in <module>
response = pandas_ai.run(df, "what is this data about?")
File "/Users/sanchit/pandas-ai/pandasai/__init__.py", line 156, in run
answer = self.run_code(
File "/Users/sanchit/pandas-ai/pandasai/__init__.py", line 199, in run_code
code_to_run = self.remove_unsafe_imports(code)
File "/Users/sanchit/pandas-ai/pandasai/__init__.py", line 174, in remove_unsafe_imports
tree = ast.parse(code)
File "/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ast.py", line 50, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
The data is about the GDP and happiness index of different countries. The dataframe has 10 rows and 3 columns, with columns named 'country', 'gdp', and 'happiness_index'. The first five rows of the dataframe are printed using the command 'print(df.head(5))'.
^
SyntaxError: invalid syntax
Therefore, I tried adding a trick for checking valid python code because if you notice the response returned by llm has a python code snippet but unfortunately this response is not always similar and sometimes returns only this string,
The data is about the GDP and happiness index of different countries. The dataframe has 10 rows and 3 columns, with columns named 'country', 'gdp', and 'happiness_index'.
Please have a look at the PR to fix this scenario.