pyspark-ai
pyspark-ai copied to clipboard
VertexAI - Error: NameError("name 'spark' is not defined")
Hi,
I'm having an issue using Vertex AI as LLM.
This is the log:
import plotly.express as px
df = spark.sql("SELECT Nationality, count(*) as cnt FROM football_stats GROUP BY Nationality ORDER BY cnt DESC LIMIT 10")
df_pd = df.toPandas()
fig = px.pie(df_pd, values="cnt", names="Nationality", title="Top 10 Nationalities")
fig.show()
INFO:spark_ai:
import plotly.express as px
df = spark.sql("SELECT Nationality, count(*) as cnt FROM football_stats GROUP BY Nationality ORDER BY cnt DESC LIMIT 10")
df_pd = df.toPandas()
fig = px.pie(df_pd, values="cnt", names="Nationality", title="Top 10 Nationalities")
fig.show()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pyspark_ai/pyspark_ai.py](https://localhost:8080/#) in plot_df(self, df, desc, cache)
412 try:
--> 413 exec(compile(code, "plot_df-CodeGen", "exec"))
414 except Exception as e:
3 frames
NameError: name 'spark' is not defined
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pyspark_ai/pyspark_ai.py](https://localhost:8080/#) in plot_df(self, df, desc, cache)
413 exec(compile(code, "plot_df-CodeGen", "exec"))
414 except Exception as e:
--> 415 raise Exception("Could not evaluate Python code", e)
416
417 def verify_df(self, df: DataFrame, desc: str, cache: bool = True) -> None:
Exception: ('Could not evaluate Python code', NameError("name 'spark' is not defined"))
and this is the configuration I have:
# Create the Spark session
spark = SparkSession.builder \
.appName('PySparkAI_with_BQ')\
.config('spark.jars', "/content/spark-3.3-bigquery-0.32.0.jar") \
.getOrCreate()
# Init Google AI platform
aiplatform.init(project=project_id)
llm = VertexAI(temperature=0.9)
# Activate pyspark_ai
spark_ai = SparkAI(llm=llm, spark_session=spark, verbose=True)
spark_ai.activate() # active partial functions for Spark DataFrame
#Load data from BQ
bq_source = spark.read.format('bigquery') \
.option('project','project-1') \
.option('parentProject','project-1') \
.option('table','dataset_name.football_stats') \
.load()
auto_graph = bq_source.ai.plot("Create a pie chart with the top 10 nationalities")
It's curious that in every new run the tool is following a different approach:
import plotly.express as px
df = spark.read.csv('football_stats.csv', header=True, inferSchema=True)
Why is trying to use csv?
Any advice is appreciated. Thanks!