pandas-ai
pandas-ai copied to clipboard
Support for Pyspark dataframe
🚀 The feature
Pyspark is used widely in the community for ETL work involving large datasets. Adding support for it will increase adoption for the product.
Motivation, pitch
My org uses, Pyspark as the only framework for ETL, EDA is done by visualising various cuts of the same pyspark dataframe.
Alternatives
No response
Additional context
No response
This would be an interesting addition. Not sure about how easy it would be to add support for pyspark in the current setup, but it's definitely worth exploring. So you would like to use pyspark as an engine if I understand correctly. Or you just want to be able to provide a spark dataframe as an input?
Pyspark engine and that has to support spark dataframe as input.
@gventuri Is there any progress/discussion on this issue? Will this be considered for future releases?
@gventuri I am also wondering if it can execute pyspark code. It took too long to query a table which is large. Or is there any workaround to replace the code to pyspark code inside the pipeline?