pandas-ai
pandas-ai copied to clipboard
Allow importing packages, capture the error and allow the user to choose whether to install them or not
I see that your current model for dealing with the installation of new libraries, is to strongly discourage it in the prompt:
Return the python code (do not import anything) and make sure to prefix the python code with <startCode> exactly and suffix the code with <endCode> exactly to get the answer to the following question
I don't think this is optimal: some of the best Pandas code generated by GPT-4 requires importing seaborn or numpy. Also, shouldn't at least importing matplotlib be allowed? Otherwise, how do you generate plots?
IMO, it would be far better to:
- allow packages to be imported
- now the LLM generates some code that import packages. When reading the code and before execution, capture all the import statements
- Check if all packages are already installed in the active environment (this adds a bit of complexity because now you need to understand if conda, pip or poetry is being used to install packages)
- if not, ask permission to the user to install packages. If permission is negated, you may print an informative message, and query again the LLM with a different prompt which contains the words
(do not import anything).
Many variations are possible:
- you could add a parameter
allow_importstorunthat switches between a prompt that allows imports, and an another one that doesn't - you could never install packages, but only ask the user if they want to install the suggested packages themselves
- etc.
Totally! I think the allow_imports approach as of now gives us the most control on what happens to the code. Ideally, we should make it so we generate the smallest amount of code, as we want to reduce hallucinations to the minimum.
Hey @AndreaPi, we've implemented this as part of e3d7d1dc259918565c0db08d535d8fd28fa7a465!
As of PR #193 we now have a whitelist of optional libraries that at the start includes sklearn, statsmodels, seaborn, plotly and ggplot. If other libraries are required please request in a new issue.