pyspark-ai
pyspark-ai copied to clipboard
Could not parse LLM output when using GPT3.5-turbo
When using GPT3.5-turbo with pyspark-ai, I'm getting the error below:
NFO: Creating temp view for the transform: df.createOrReplaceTempView("spark_ai_temp_view_875e9a")
Entering new AgentExecutor chain... Traceback (most recent call last): File "/home/hadoop/p.py", line 19, in
df.ai.transform("count of shipments by mode") File "/home/hadoop/.local/lib/python3.11/site-packages/pyspark_ai/ai_utils.py", line 39, in transform return self.spark_ai.transform_df(self.df_instance, desc, cache) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/pyspark_ai/pyspark_ai.py", line 376, in transform_df sql_query = self._get_transform_sql_query(df, desc, cache) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/pyspark_ai/pyspark_ai.py", line 354, in _get_transform_sql_query sql_query = self._get_transform_sql_query_from_agent( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/pyspark_ai/pyspark_ai.py", line 332, in _get_transform_sql_query_from_agent llm_result = self._sql_agent.run( ^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/chains/base.py", line 480, in run return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/chains/base.py", line 282, in call raise e File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/chains/base.py", line 276, in call self._call(inputs, run_manager=run_manager) File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/agents/agent.py", line 1036, in _call next_step_output = self._take_next_step( ^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/agents/agent.py", line 844, in _take_next_step raise e File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/agents/agent.py", line 833, in _take_next_step output = self.agent.plan( ^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/agents/agent.py", line 457, in plan return self.output_parser.parse(full_output) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/langchain/agents/mrkl/output_parser.py", line 52, in parse raise OutputParserException( langchain.schema.output_parser.OutputParserException: Could not parse LLM output: SELECT l_shipmode, COUNT(*) AS shipment_count FROM spark_ai_temp_view_875e9a GROUP BY l_shipmode
23/08/28 17:18:52 INFO SparkContext: Invoking stop() from shutdown hook 23/08/28 17:18:52 INFO SparkContext: SparkContext is stopping with exitCode 0. 23/08/28 17:18:52 INFO SparkUI: Stopped Spark web UI at http://ip-10-0-10-180.ec2.internal:4040 23/08/28 17:18:52 INFO YarnClientSchedulerBackend: Interrupting monitor thread 23/08/28 17:18:52 INFO YarnClientSchedulerBackend: Shutting down all executors 23/08/28 17:18:52 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 23/08/28 17:18:52 INFO YarnClientSchedulerBackend: YARN client scheduler backend Stopped 23/08/28 17:18:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/08/28 17:18:52 INFO MemoryStore: MemoryStore cleared 23/08/28 17:18:52 INFO BlockManager: BlockManager stopped 23/08/28 17:18:52 INFO BlockManagerMaster: BlockManagerMaster stopped 23/08/28 17:18:52 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/08/28 17:18:52 INFO SparkContext: Successfully stopped SparkContext 23/08/28 17:18:52 INFO ShutdownHookManager: Shutdown hook called
From the error logs, the view and query created looks correct but the agent is not able to parse the LLM output.
Here is my code:
from pyspark_ai import SparkAI
from pyspark.sql import SparkSession
from langchain.chat_models import ChatOpenAI
spark = SparkSession.builder \
.appName("ReadFromHiveTable") \
.enableHiveSupport() \
.getOrCreate()
spark_ai = SparkAI(llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0))
spark_ai.activate()
df = spark.sql("select * from data_lake.lineitem")
df.show(5)
df.createOrReplaceTempView("spark_ai_temp_view_cb198e")
spark.sql("SELECT l_shipmode, COUNT(*) AS shipment_count FROM spark_ai_temp_view_cb198e GROUP BY l_shipmode").show()
df.ai.transform("count of shipments by mode")
df.show(5)
@s4saurabh Thanks for reporting! This is known issue. The ReAct SQL agent works best with GPT-4. I am trying to make the transform API working with gpt-3.5