pandas-ai
pandas-ai copied to clipboard
FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'
when calling this code:
pandas_ai.run( data, "plot the growth of Internet popularity in Entity Russia", )
this error is displayed:
FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'
the file name may change, that's not the point. I think this is due to the fact that ChatGPT is thinking about the code starting with the import and loading of the dataset. you can solve it by removing a few lines of code using regular expressions or in other ways. I haven't solved this problem yet
Hey @prtolem,
Can you try to instantiate Pandas with verbose=True?
pandas_ai = PandasAI(llm, verbose=True) and attach the log?
Hey @prtolem,
Can you try to instantiate Pandas with verbose=True?
pandas_ai = PandasAI(llm, verbose=True) and attach the log?
it doesn't bring out anything new. as the traceback of the error was, it remained. here it is
FileNotFoundError Traceback (most recent call last)
Cell In[7], line 1
----> 1 pandas_ai.run(
2 data,
3 "plot the growth of Internet popularity in Entity Russia",
4 )
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandasai\__init__.py:119, in PandasAI.run(self, data_frame, prompt, is_conversational_answer, show_code)
116 if show_code and self._in_notebook:
117 self.notebook.create_new_cell(code)
--> 119 answer = self.run_code(code, data_frame, False)
120 self.code_output = answer
121 self.log(f"Answer: {answer}")
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandasai\__init__.py:165, in PandasAI.run_code(self, code, df, use_error_correction_framework)
161 code_to_run = self._llm.generate_code(
162 error_correcting_instruction, ""
163 )
164 else:
--> 165 exec(code)
167 # Restore standard output and get the captured output
168 sys.stdout = sys.__stdout__
File <string>:4
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
899 kwds_defaults = _refine_defaults_read(
900 dialect,
901 delimiter,
(...)
908 dtype_backend=dtype_backend,
909 )
910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:577, in _read(filepath_or_buffer, kwds)
574 _validate_names(kwds.get("names", None))
576 # Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
579 if chunksize or iterator:
580 return parser
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:1407, in TextFileReader.__init__(self, f, engine, **kwds)
1404 self.options["has_index_names"] = kwds["has_index_names"]
1406 self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:1661, in TextFileReader._make_engine(self, f, engine)
1659 if "b" not in mode:
1660 mode += "b"
-> 1661 self.handles = get_handle(
1662 f,
1663 mode,
1664 encoding=self.options.get("encoding", None),
1665 compression=self.options.get("compression", None),
1666 memory_map=self.options.get("memory_map", False),
1667 is_text=is_text,
1668 errors=self.options.get("encoding_errors", "strict"),
1669 storage_options=self.options.get("storage_options", None),
1670 )
1671 assert self.handles is not None
1672 f = self.handles.handle
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\common.py:859, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
854 elif isinstance(handle, str):
855 # Check whether the filename is to be opened in binary mode.
856 # Binary mode does not support 'encoding' and 'newline'.
857 if ioargs.encoding and "b" not in ioargs.mode:
858 # Encoding
--> 859 handle = open(
860 handle,
861 ioargs.mode,
862 encoding=ioargs.encoding,
863 errors=errors,
864 newline="",
865 )
866 else:
867 # Binary mode
868 handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'
I can also send you my notebook if you need it
same problem here. sometimes it tries to reach the column name that is not real.
[/usr/local/lib/python3.10/dist-packages/pandas/io/common.py](https://localhost:8080/#) in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
857 if ioargs.encoding and "b" not in ioargs.mode:
858 # Encoding
--> 859 handle = open(
860 handle,
861 ioargs.mode,
FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'
how can I rewrite the query so that it understands that I need to get statistics from the Entities column with the value Russia?
@prtolem mind sharing the notebook so that I can look into it and reproduce. Might be an hallucination of the LLM!
This is what I used as a temporary workaround of the filename.csv problem in my application, in case it's useful to anyone. Just finds and replaces filename.csv, then uses that as the data file. Cheap but effective :)
dataSource = input("\nEnter file name and extension (File must be inside of the PlastiGPT folder): ")
if os.path.isfile(dataSource):
if os.path.isfile("filename.csv"):
os.remove("filename.csv")
shutil.copyfile(dataSource, "filename.csv")
break
else:
print("\nThe file name you entered doesn't exist. Be sure to include the file extension")
suddenly, I checked cmd and saw the run() execution log there. the code that ChatGPT generated:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('filename.csv') # replace filename with actual file name
russia_df = df[df['Entity'] == 'Russia']
plt.plot(russia_df['Year'], russia_df['Internet Users(%)'])
plt.xlabel('Year')
plt.ylabel('Internet Users(%)')
plt.title('Internet Popularity in Russia')
plt.show()
I think it's worth somehow removing this part programmatically or adding something like "leave only the main part of the code, without importing and loading the dataset" to promt
Ok this is totally wrong, shouldn't happen! Thanks for submitting @prtolem. Could you provide the exact same prompt you are using to generate it so that we can investigate further?
you can get acquainted with the project here. most of the project is written in Russian. use a translator to understand the code
Same problem here
@prtolem I am not able to re-produce this. are you using the latest pandasai version. you can check by running,
pip freeze | grep pandasai
and please try to upgrade to the latest version pip install pandasai -U
(latest version v0.2.11)
Here's what I did:
import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI
data = pd.read_csv('internet_users.csv')
pandas_ai.run(
data,
"plot the growth of Internet popularity in Entity Russia",
)
Code generated:
russia_df = df[df['Entity'] == 'Russia']
russia_df['Year'] = pd.to_datetime(russia_df['Year'], format='%Y')
russia_df.set_index('Year', inplace=True)
russia_df['No. of Internet Users'].plot()
plt.title('Internet Popularity in Russia')
plt.xlabel('Year')
plt.ylabel('No. of Internet Users')
plt.show()
Although, the code generated by LLM is not absolutely correct. The dataframe should be named data
instead of df
but it is already reported. Please follow this thread regarding prompt for existing df
@sandiemann Can confirm, as of v0.2.11 this problem seems to be fixed, along with #52
@gventuri you can close this.
This issue seems to have reappeared in pandasai v0.2.12 Edit: Fixed in v0.2.13
@JeffreyLind22 Thanks for confirming. @gventuri you can close this issue.