pyspark-ai Error in executing spark_ai.activate().. please help

ModuleNotFoundError Traceback (most recent call last) Cell In[14], line 2 1 # Activate partial functions for Spark DataFrame ----> 2 spark_ai.activate()

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/pyspark_ai/pyspark_ai.py:428, in SparkAI.activate(self) 426 DataFrame.ai = AIUtils(self) 427 # Patch the Spark Connect DataFrame as well. --> 428 from pyspark.sql.connect.dataframe import DataFrame as CDataFrame 429 CDataFrame.ai = AIUtils(self)

ModuleNotFoundError: No module named 'pyspark.sql.connect'

Jul 07 '23 12:07 FahimMohd

@FahimMohd Thanks for the feedback. It is recommended to use PySpark 3.4.0 and above which supports Spark Connect. I will improve this by ignoring the import error of Spark Connect.

Jul 07 '23 18:07 gengliangwang

Thanks a lot for the quick response !! I was able to make progress, however, I encountered one more issue

ImportError: cannot import name '_from_numpy_type' from 'pyspark.sql.types' (/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/pyspark/sql/types.py)

It seems I am missing something, any thoughts on this, please

Jul 08 '23 19:07 FahimMohd

@FahimMohd This should be fixed in https://github.com/databrickslabs/pyspark-ai/pull/67. Please try upgrade the package and see if it still happens.

Jul 11 '23 07:07 gengliangwang

Thanks @gengliangwang for the quick fix, I could proceed further and got the below issue now, I was able to do spark_ai.activate()

_AttributeError Traceback (most recent call last) /tmp/ipykernel_13799/1168648997.py in ?() ----> 1 transformed_df = df.ai.transform('What is the count of transactions ?')

~/anaconda3/envs/python3/lib/python3.10/site-packages/pandas/core/generic.py in ?(self, name) 5985 and name not in self._accessors 5986 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5987 ): 5988 return self[name] -> 5989 return object.getattribute(self, name)

AttributeError: 'DataFrame' object has no attribute 'ai'_

Jul 12 '23 12:07 FahimMohd

@FahimMohd Could you post the whole python code? It didn't looks like you have executed spark_ai.activate()

Jul 12 '23 22:07 gengliangwang

Please refer this

Jul 13 '23 05:07 FahimMohd

@FahimMohd df must be a Spark DataFrame. In your code, it is a pandas DataFrame.

Jul 13 '23 06:07 gengliangwang

It worked !!!

Jul 13 '23 06:07 FahimMohd

pyspark-ai pyspark-ai copied to clipboard

Error in executing spark_ai.activate().. please help

pyspark-ai
pyspark-ai copied to clipboard