pyspark-ai VertexAI - Error: NameError("name 'spark' is not defined")

VertexAI - Error: NameError("name 'spark' is not defined")

Open chuyasturiano opened this issue 1 year ago • 0 comments

Hi,

I'm having an issue using Vertex AI as LLM.

This is the log:

import plotly.express as px

df = spark.sql("SELECT Nationality, count(*) as cnt FROM football_stats GROUP BY Nationality ORDER BY cnt DESC LIMIT 10")
df_pd = df.toPandas()
fig = px.pie(df_pd, values="cnt", names="Nationality", title="Top 10 Nationalities")
fig.show()

INFO:spark_ai:

import plotly.express as px

df = spark.sql("SELECT Nationality, count(*) as cnt FROM football_stats GROUP BY Nationality ORDER BY cnt DESC LIMIT 10")
df_pd = df.toPandas()
fig = px.pie(df_pd, values="cnt", names="Nationality", title="Top 10 Nationalities")
fig.show()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pyspark_ai/pyspark_ai.py](https://localhost:8080/#) in plot_df(self, df, desc, cache)
    412         try:
--> 413             exec(compile(code, "plot_df-CodeGen", "exec"))
    414         except Exception as e:

3 frames
NameError: name 'spark' is not defined

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pyspark_ai/pyspark_ai.py](https://localhost:8080/#) in plot_df(self, df, desc, cache)
    413             exec(compile(code, "plot_df-CodeGen", "exec"))
    414         except Exception as e:
--> 415             raise Exception("Could not evaluate Python code", e)
    416 
    417     def verify_df(self, df: DataFrame, desc: str, cache: bool = True) -> None:

Exception: ('Could not evaluate Python code', NameError("name 'spark' is not defined"))

and this is the configuration I have:

# Create the Spark session
spark = SparkSession.builder \
  .appName('PySparkAI_with_BQ')\
  .config('spark.jars', "/content/spark-3.3-bigquery-0.32.0.jar") \
  .getOrCreate()

# Init Google AI platform
aiplatform.init(project=project_id)

llm = VertexAI(temperature=0.9)

# Activate pyspark_ai
spark_ai = SparkAI(llm=llm, spark_session=spark, verbose=True)
spark_ai.activate()  # active partial functions for Spark DataFrame

#Load data from BQ
bq_source = spark.read.format('bigquery') \
    .option('project','project-1') \
    .option('parentProject','project-1') \
    .option('table','dataset_name.football_stats') \
    .load()
 
auto_graph = bq_source.ai.plot("Create a pie chart with the top 10 nationalities")

It's curious that in every new run the tool is following a different approach:

import plotly.express as px

df = spark.read.csv('football_stats.csv', header=True, inferSchema=True)

Why is trying to use csv?

Any advice is appreciated. Thanks!

Aug 15 '23 14:08 chuyasturiano

pyspark-ai pyspark-ai copied to clipboard

VertexAI - Error: NameError("name 'spark' is not defined")

pyspark-ai
pyspark-ai copied to clipboard