pandas-ai icon indicating copy to clipboard operation
pandas-ai copied to clipboard

FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'

Open prtolem opened this issue 1 year ago • 10 comments

when calling this code: pandas_ai.run( data, "plot the growth of Internet popularity in Entity Russia", ) this error is displayed: FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv' the file name may change, that's not the point. I think this is due to the fact that ChatGPT is thinking about the code starting with the import and loading of the dataset. you can solve it by removing a few lines of code using regular expressions or in other ways. I haven't solved this problem yet

prtolem avatar May 04 '23 18:05 prtolem

Hey @prtolem,

Can you try to instantiate Pandas with verbose=True?

pandas_ai = PandasAI(llm, verbose=True) and attach the log?

gventuri avatar May 04 '23 18:05 gventuri

Hey @prtolem,

Can you try to instantiate Pandas with verbose=True?

pandas_ai = PandasAI(llm, verbose=True) and attach the log?

it doesn't bring out anything new. as the traceback of the error was, it remained. here it is

FileNotFoundError                         Traceback (most recent call last)
Cell In[7], line 1
----> 1 pandas_ai.run(
      2     data,
      3     "plot the growth of Internet popularity in Entity Russia",
      4 )

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandasai\__init__.py:119, in PandasAI.run(self, data_frame, prompt, is_conversational_answer, show_code)
    116 if show_code and self._in_notebook:
    117     self.notebook.create_new_cell(code)
--> 119 answer = self.run_code(code, data_frame, False)
    120 self.code_output = answer
    121 self.log(f"Answer: {answer}")

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandasai\__init__.py:165, in PandasAI.run_code(self, code, df, use_error_correction_framework)
    161             code_to_run = self._llm.generate_code(
    162                 error_correcting_instruction, ""
    163             )
    164 else:
--> 165     exec(code)
    167 # Restore standard output and get the captured output
    168 sys.stdout = sys.__stdout__

File <string>:4

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
    899 kwds_defaults = _refine_defaults_read(
    900     dialect,
    901     delimiter,
   (...)
    908     dtype_backend=dtype_backend,
    909 )
    910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:577, in _read(filepath_or_buffer, kwds)
    574 _validate_names(kwds.get("names", None))
    576 # Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
    579 if chunksize or iterator:
    580     return parser

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:1407, in TextFileReader.__init__(self, f, engine, **kwds)
   1404     self.options["has_index_names"] = kwds["has_index_names"]
   1406 self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:1661, in TextFileReader._make_engine(self, f, engine)
   1659     if "b" not in mode:
   1660         mode += "b"
-> 1661 self.handles = get_handle(
   1662     f,
   1663     mode,
   1664     encoding=self.options.get("encoding", None),
   1665     compression=self.options.get("compression", None),
   1666     memory_map=self.options.get("memory_map", False),
   1667     is_text=is_text,
   1668     errors=self.options.get("encoding_errors", "strict"),
   1669     storage_options=self.options.get("storage_options", None),
   1670 )
   1671 assert self.handles is not None
   1672 f = self.handles.handle

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\common.py:859, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    854 elif isinstance(handle, str):
    855     # Check whether the filename is to be opened in binary mode.
    856     # Binary mode does not support 'encoding' and 'newline'.
    857     if ioargs.encoding and "b" not in ioargs.mode:
    858         # Encoding
--> 859         handle = open(
    860             handle,
    861             ioargs.mode,
    862             encoding=ioargs.encoding,
    863             errors=errors,
    864             newline="",
    865         )
    866     else:
    867         # Binary mode
    868         handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'

I can also send you my notebook if you need it

prtolem avatar May 04 '23 18:05 prtolem

same problem here. sometimes it tries to reach the column name that is not real.

[/usr/local/lib/python3.10/dist-packages/pandas/io/common.py](https://localhost:8080/#) in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    857         if ioargs.encoding and "b" not in ioargs.mode:
    858             # Encoding
--> 859             handle = open(
    860                 handle,
    861                 ioargs.mode,

FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'

Ink6220 avatar May 04 '23 19:05 Ink6220

how can I rewrite the query so that it understands that I need to get statistics from the Entities column with the value Russia?

prtolem avatar May 04 '23 19:05 prtolem

@prtolem mind sharing the notebook so that I can look into it and reproduce. Might be an hallucination of the LLM!

gventuri avatar May 05 '23 12:05 gventuri

This is what I used as a temporary workaround of the filename.csv problem in my application, in case it's useful to anyone. Just finds and replaces filename.csv, then uses that as the data file. Cheap but effective :)

dataSource = input("\nEnter file name and extension (File must be inside of the PlastiGPT folder): ")

if os.path.isfile(dataSource):

    if os.path.isfile("filename.csv"):
    
        os.remove("filename.csv")
        
    shutil.copyfile(dataSource, "filename.csv")
                        
    break
    
else:

    print("\nThe file name you entered doesn't exist. Be sure to include the file extension")

JeffreyLind3 avatar May 05 '23 14:05 JeffreyLind3

suddenly, I checked cmd and saw the run() execution log there. the code that ChatGPT generated:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('filename.csv') # replace filename with actual file name

russia_df = df[df['Entity'] == 'Russia']
plt.plot(russia_df['Year'], russia_df['Internet Users(%)'])
plt.xlabel('Year')
plt.ylabel('Internet Users(%)')
plt.title('Internet Popularity in Russia')
plt.show()

I think it's worth somehow removing this part programmatically or adding something like "leave only the main part of the code, without importing and loading the dataset" to promt

prtolem avatar May 05 '23 16:05 prtolem

Ok this is totally wrong, shouldn't happen! Thanks for submitting @prtolem. Could you provide the exact same prompt you are using to generate it so that we can investigate further?

gventuri avatar May 05 '23 17:05 gventuri

you can get acquainted with the project here. most of the project is written in Russian. use a translator to understand the code

prtolem avatar May 05 '23 18:05 prtolem

Same problem here

yvann-ba avatar May 06 '23 17:05 yvann-ba

@prtolem I am not able to re-produce this. are you using the latest pandasai version. you can check by running, pip freeze | grep pandasai and please try to upgrade to the latest version pip install pandasai -U (latest version v0.2.11)

Here's what I did:

import pandas as pd

from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

data = pd.read_csv('internet_users.csv')

pandas_ai.run(
    data,
    "plot the growth of Internet popularity in Entity Russia",
)

Code generated:

russia_df = df[df['Entity'] == 'Russia']
russia_df['Year'] = pd.to_datetime(russia_df['Year'], format='%Y')
russia_df.set_index('Year', inplace=True)
russia_df['No. of Internet Users'].plot()
plt.title('Internet Popularity in Russia')
plt.xlabel('Year')
plt.ylabel('No. of Internet Users')
plt.show()

Figure_1

Although, the code generated by LLM is not absolutely correct. The dataframe should be named data instead of df but it is already reported. Please follow this thread regarding prompt for existing df

sandiemann avatar May 15 '23 22:05 sandiemann

@sandiemann Can confirm, as of v0.2.11 this problem seems to be fixed, along with #52

JeffreyLind3 avatar May 16 '23 04:05 JeffreyLind3

@gventuri you can close this.

sandiemann avatar May 16 '23 11:05 sandiemann

This issue seems to have reappeared in pandasai v0.2.12 Edit: Fixed in v0.2.13

JeffreyLind3 avatar May 16 '23 15:05 JeffreyLind3

@JeffreyLind22 Thanks for confirming. @gventuri you can close this issue.

sandiemann avatar May 19 '23 18:05 sandiemann