[BUG] `Can't pickle <class 'datetime.datetime'>: it's not the same object as datetime.datetime` when running flow in Azure ML
Describe the bug
When running our flow in Azure ML, we see the following exception:
Can't pickle <class 'datetime.datetime'>: it's not the same object as datetime.datetime
Traceback (most recent call last):
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/flow_executor.py\", line 820, in _exec
output, nodes_outputs = self._traverse_nodes(inputs, context)
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/flow_executor.py\", line 918, in _traverse_nodes
nodes_outputs, bypassed_nodes = self._submit_to_scheduler(context, inputs, batch_nodes)
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/flow_executor.py\", line 938, in _submit_to_scheduler
return FlowNodesScheduler(self._tools_manager, inputs, nodes, self._node_concurrency, context).execute()
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/_flow_nodes_scheduler.py\", line 69, in execute
raise e
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/_flow_nodes_scheduler.py\", line 58, in execute
self._dag_manager.complete_nodes(self._collect_outputs(completed_futures))
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/_flow_nodes_scheduler.py\", line 90, in _collect_outputs
each_node_result = each_future.result()
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/concurrent/futures/_base.py\", line 439, in result
return self.__get_result()
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/concurrent/futures/_base.py\", line 391, in __get_result
raise self._exception
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/concurrent/futures/thread.py\", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/_flow_nodes_scheduler.py\", line 111, in _exec_single_node_in_thread
result = context.invoke_tool(node, f, kwargs=kwargs)
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/_core/flow_execution_context.py\", line 107, in invoke_tool
self._run_tracker.persist_node_run(run_info)
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/_core/run_tracker.py\", line 388, in persist_node_run
self._storage.persist_node_run(run_info)
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/_line_execution_process_pool.py\", line 64, in persist_node_run
self.queue.put(run_info)
File \"<string>\", line 2, in put
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/multiprocessing/managers.py\", line 809, in _callmethod
conn.send((self._id, methodname, args, kwds))
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/multiprocessing/connection.py\", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/multiprocessing/reduction.py\", line 51, in dumps
cls(buf, protocol).dump(obj)
Locally, the same flow always works.
How To Reproduce the bug
We don't have a minimal reproduction of the issue.
Expected behavior
Run always succeeds.
Running Information(please complete the following information):
- Promptflow Package Version
promptflow==1.4.1 - Operating System: WSL2 (Ubuntu)
- Python Version
Python 3.9.18
Hi @bastbu,
Thank you for your feedback. Since we are currently on the lunar new year holiday, it may take some time to formally investigate and fix the issue. We expect the investigation to be available after the vacation, which is on February 18th.
According to this link, restarting the compute or changing the session (for automatic runtime case) may help resolve the error. Could you please give it a try and see if the error still occurs?
@thy09 please take a look at this issue when available. After conducting a basic search, I found that this SO answer also encountered the same error. However, there is no known root cause or fix at the moment.
Our investigation suggests that the issue most likely stems from data transfer problems during batch run.
When conducting a flow test, we employ a primary process to execute the flow and transfer all the data to storage without encountering any issues. However, during batch run, to prevent job failure caused by a single line becoming unresponsive, we utilize subprocesses as workers to handle individual lines. These workers then relay the data back to the main process for storage. During this stage, data is transferred using Python's Queue, which involves serialization through pickling.
Yet, certain conditions, such as the presence of a mocked datetime object as observed in this particular issue, render the data non-serializable, leading to errors exclusive to batch execution.
Given that the ultimate goal is to store the data in JSON format, a potential solution is to convert the data to JSON prior to passing it to the main process.
We will then investigate how to do code changes with this solution. Thanks for the feedback.
Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!
Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!