pyspark-ai icon indicating copy to clipboard operation
pyspark-ai copied to clipboard

Could not parse LLM output when using GPT3.5-turbo

Open s4saurabh opened this issue 1 year ago • 1 comments

When using GPT3.5-turbo with pyspark-ai, I'm getting the error below:

NFO: Creating temp view for the transform: df.createOrReplaceTempView("spark_ai_temp_view_875e9a")

Entering new AgentExecutor chain... Traceback (most recent call last): File "/home/hadoop/p.py", line 19, in df.ai.transform("count of shipments by mode") File "/home/hadoop/.local/lib/python3.11/site-packages/pyspark_ai/ai_utils.py", line 39, in transform return self.spark_ai.transform_df(self.df_instance, desc, cache) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/pyspark_ai/pyspark_ai.py", line 376, in transform_df sql_query = self._get_transform_sql_query(df, desc, cache) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/pyspark_ai/pyspark_ai.py", line 354, in _get_transform_sql_query sql_query = self._get_transform_sql_query_from_agent( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/pyspark_ai/pyspark_ai.py", line 332, in _get_transform_sql_query_from_agent llm_result = self._sql_agent.run( ^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/chains/base.py", line 480, in run return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/chains/base.py", line 282, in call raise e File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/chains/base.py", line 276, in call self._call(inputs, run_manager=run_manager) File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/agents/agent.py", line 1036, in _call next_step_output = self._take_next_step( ^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/agents/agent.py", line 844, in _take_next_step raise e File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/agents/agent.py", line 833, in _take_next_step output = self.agent.plan( ^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/agents/agent.py", line 457, in plan return self.output_parser.parse(full_output) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/agents/mrkl/output_parser.py", line 52, in parse raise OutputParserException( langchain.schema.output_parser.OutputParserException: Could not parse LLM output: SELECT l_shipmode, COUNT(*) AS shipment_count FROM spark_ai_temp_view_875e9a GROUP BY l_shipmode 23/08/28 17:18:52 INFO SparkContext: Invoking stop() from shutdown hook 23/08/28 17:18:52 INFO SparkContext: SparkContext is stopping with exitCode 0. 23/08/28 17:18:52 INFO SparkUI: Stopped Spark web UI at http://ip-10-0-10-180.ec2.internal:4040 23/08/28 17:18:52 INFO YarnClientSchedulerBackend: Interrupting monitor thread 23/08/28 17:18:52 INFO YarnClientSchedulerBackend: Shutting down all executors 23/08/28 17:18:52 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 23/08/28 17:18:52 INFO YarnClientSchedulerBackend: YARN client scheduler backend Stopped 23/08/28 17:18:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/08/28 17:18:52 INFO MemoryStore: MemoryStore cleared 23/08/28 17:18:52 INFO BlockManager: BlockManager stopped 23/08/28 17:18:52 INFO BlockManagerMaster: BlockManagerMaster stopped 23/08/28 17:18:52 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/08/28 17:18:52 INFO SparkContext: Successfully stopped SparkContext 23/08/28 17:18:52 INFO ShutdownHookManager: Shutdown hook called

From the error logs, the view and query created looks correct but the agent is not able to parse the LLM output.

Here is my code:

from pyspark_ai import SparkAI
from pyspark.sql import SparkSession
from langchain.chat_models import ChatOpenAI

spark = SparkSession.builder \
    .appName("ReadFromHiveTable") \
    .enableHiveSupport() \
    .getOrCreate()

spark_ai = SparkAI(llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0))
spark_ai.activate()

df = spark.sql("select * from data_lake.lineitem")
df.show(5)

df.createOrReplaceTempView("spark_ai_temp_view_cb198e")
spark.sql("SELECT l_shipmode, COUNT(*) AS shipment_count FROM spark_ai_temp_view_cb198e GROUP BY l_shipmode").show()

df.ai.transform("count of shipments by mode")
df.show(5)

s4saurabh avatar Aug 28 '23 17:08 s4saurabh

@s4saurabh Thanks for reporting! This is known issue. The ReAct SQL agent works best with GPT-4. I am trying to make the transform API working with gpt-3.5

gengliangwang avatar Sep 01 '23 23:09 gengliangwang