pyspark-ai
pyspark-ai copied to clipboard
Error in executing spark_ai.activate().. please help
ModuleNotFoundError Traceback (most recent call last) Cell In[14], line 2 1 # Activate partial functions for Spark DataFrame ----> 2 spark_ai.activate()
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/pyspark_ai/pyspark_ai.py:428, in SparkAI.activate(self) 426 DataFrame.ai = AIUtils(self) 427 # Patch the Spark Connect DataFrame as well. --> 428 from pyspark.sql.connect.dataframe import DataFrame as CDataFrame 429 CDataFrame.ai = AIUtils(self)
ModuleNotFoundError: No module named 'pyspark.sql.connect'
@FahimMohd Thanks for the feedback. It is recommended to use PySpark 3.4.0 and above which supports Spark Connect. I will improve this by ignoring the import error of Spark Connect.
Thanks a lot for the quick response !! I was able to make progress, however, I encountered one more issue
ImportError: cannot import name '_from_numpy_type' from 'pyspark.sql.types' (/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/pyspark/sql/types.py)
It seems I am missing something, any thoughts on this, please
@FahimMohd This should be fixed in https://github.com/databrickslabs/pyspark-ai/pull/67. Please try upgrade the package and see if it still happens.
Thanks @gengliangwang for the quick fix, I could proceed further and got the below issue now, I was able to do spark_ai.activate()
_AttributeError Traceback (most recent call last) /tmp/ipykernel_13799/1168648997.py in ?() ----> 1 transformed_df = df.ai.transform('What is the count of transactions ?')
~/anaconda3/envs/python3/lib/python3.10/site-packages/pandas/core/generic.py in ?(self, name) 5985 and name not in self._accessors 5986 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5987 ): 5988 return self[name] -> 5989 return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'ai'_
@FahimMohd Could you post the whole python code? It didn't looks like you have executed spark_ai.activate()
Please refer this
@FahimMohd df
must be a Spark DataFrame. In your code, it is a pandas DataFrame.
It worked !!!