pandas-ai SyntaxError: invalid syntax changing the query in the demo example

Sample DataFrame
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [21400000, 2940000, 2830000, 3870000, 2160000, 1350000, 1780000, 1320000, 516000, 14000000],
    "happiness_index": [7.3, 7.2, 6.5, 7.0, 6.0, 6.3, 7.3, 7.3, 5.9, 5.0]
})

from pandasai.llm.openai import OpenAI
llm = OpenAI(api_token="")

pandas_ai = PandasAI(llm)
pandas_ai.run(df, prompt='What is the data about?')

The above code (changing the prompt to "What is the data about?"), gives the following error. Looks like it is still able to describe the data, but its giving syntaxerror.

Traceback (most recent call last):
 ......
  File "<unknown>", line 2
    The data is about a dataframe with 26 columns and 5 rows. The columns include api_id, email, name, phone number, and various survey questions such as age, gender, and income. There is also a column for whether the respondent was invited by a friend. 
        ^^^^
SyntaxError: invalid syntax

May 03 '23 06:05 neicras

I am running into this issue as well, looks like it's treating the prompt as code hence the syntax error.

pandas_ai.run(df, prompt=f'"{query}"')

resulted in the same error.

May 04 '23 03:05 snacsnoc

Sounds very weird. Can you try to instantiate Pandas with verbose=True? @neicras @snacsnoc

pandas_ai = PandasAI(llm, verbose=True)

and attach the log?

May 04 '23 13:05 gventuri

@gventuri I tried to re-produce this exception,

  ...: from examples.data.sample_dataframe import dataframe
  ...: 
  ...: from pandasai import PandasAI
  ...: from pandasai.llm.openai import OpenAI
  ...: 
  ...: df = pd.DataFrame(dataframe)
  ...: 
  ...: llm = OpenAI()
  ...: pandas_ai = PandasAI(llm, verbose=True, conversational=False)
  ...: response = pandas_ai.run(df, "Calculate the sum of the gdp of north american countries")
  ...: print(response)


Running PandasAI with openai LLM...

Code generated:

df.loc[df['country'].isin(['Canada', 'Mexico', 'United States']), 'gdp'].sum()

Answer: 20901884461056
20901884461056

Now when executing this prompt, In[2]: response = pandas_ai.run(df, "what is this data about?")

Stack:

Running PandasAI with openai LLM...

Traceback (most recent call last):
  File "/Users/pandas-ai/testenv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-9e65e54f8bb4>", line 1, in <module>
    response = pandas_ai.run(df, "what is this data about?")
  File "/Users/pandas-ai/pandasai/__init__.py", line 126, in run
    code = self._llm.generate_code(
  File "/Users/pandas-ai/pandasai/llm/base.py", line 109, in generate_code
    return self._extract_code(self.call(instruction, prompt, suffix="\n\nCode:\n"))
  File "/Users/pandas-ai/pandasai/llm/base.py", line 84, in _extract_code
    raise NoCodeFoundError("No code found in the response")
pandasai.exceptions.NoCodeFoundError: No code found in the response

This is thrown on checking if it is a valid python code,

if not self._is_python_code(code):
       raise NoCodeFoundError("No code found in the response")

Surprisingly, it works with this promt ("what is this data?") because of python code snippet

In[3]: response = pandas_ai.run(df, "what is this data?")

Running PandasAI with openai LLM...

Code generated:

import pandas as pd
data = {'country': ['France', 'United Kingdom', 'Italy', 'United States', 'Germany'],
        'gdp': [2411255037952, 1340019902, 7109369020, 9913964056, 4288321544],
        'happiness_index': [6.38, 7.07, 6.66, 7.07, 6.66]}
df = pd.DataFrame(data)
print(df.head(5))

Answer:           country             gdp  happiness_index
0   United States  19294482071552             6.94
1  United Kingdom   2891615567872             7.16
2          France   2411255037952             6.66
3         Germany   3435817336832             7.07
4           Italy   1745433788416             6.

If I exclude the valid code check condition, I end up at the syntax error due to the ast code parsing (tree = ast.parse(code))

In[4]: response = pandas_ai.run(df, "what is this data about?")
Running PandasAI with openai LLM...

Code generated:

The data is about the GDP and happiness index of different countries. The dataframe has 10 rows and 3 columns, with columns named 'country', 'gdp', and 'happiness_index'. The first five rows of the dataframe are printed using the command 'print(df.head(5))'.

Traceback (most recent call last):
  File "/Users/sanchit/pandas-ai/testenv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-9e65e54f8bb4>", line 1, in <module>
    response = pandas_ai.run(df, "what is this data about?")
  File "/Users/sanchit/pandas-ai/pandasai/__init__.py", line 156, in run
    answer = self.run_code(
  File "/Users/sanchit/pandas-ai/pandasai/__init__.py", line 199, in run_code
    code_to_run = self.remove_unsafe_imports(code)
  File "/Users/sanchit/pandas-ai/pandasai/__init__.py", line 174, in remove_unsafe_imports
    tree = ast.parse(code)
  File "/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    The data is about the GDP and happiness index of different countries. The dataframe has 10 rows and 3 columns, with columns named 'country', 'gdp', and 'happiness_index'. The first five rows of the dataframe are printed using the command 'print(df.head(5))'.
        ^
SyntaxError: invalid syntax

Therefore, I tried adding a trick for checking valid python code because if you notice the response returned by llm has a python code snippet but unfortunately this response is not always similar and sometimes returns only this string,

The data is about the GDP and happiness index of different countries. The dataframe has 10 rows and 3 columns, with columns named 'country', 'gdp', and 'happiness_index'.

Please have a look at the PR to fix this scenario.

May 15 '23 17:05 sandiemann

pandas-ai pandas-ai copied to clipboard

SyntaxError: invalid syntax changing the query in the demo example

pandas-ai
pandas-ai copied to clipboard