promptflow [BUG] `Can't pickle <class 'datetime.datetime'>: it's not the same object as datetime.datetime` when running flow in Azure ML

Describe the bug

When running our flow in Azure ML, we see the following exception:

Can't pickle <class 'datetime.datetime'>: it's not the same object as datetime.datetime

Traceback (most recent call last):
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/flow_executor.py\", line 820, in _exec
    output, nodes_outputs = self._traverse_nodes(inputs, context)
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/flow_executor.py\", line 918, in _traverse_nodes
    nodes_outputs, bypassed_nodes = self._submit_to_scheduler(context, inputs, batch_nodes)
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/flow_executor.py\", line 938, in _submit_to_scheduler
    return FlowNodesScheduler(self._tools_manager, inputs, nodes, self._node_concurrency, context).execute()
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/_flow_nodes_scheduler.py\", line 69, in execute
    raise e
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/_flow_nodes_scheduler.py\", line 58, in execute
    self._dag_manager.complete_nodes(self._collect_outputs(completed_futures))
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/_flow_nodes_scheduler.py\", line 90, in _collect_outputs
    each_node_result = each_future.result()
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/concurrent/futures/_base.py\", line 439, in result
    return self.__get_result()
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/concurrent/futures/_base.py\", line 391, in __get_result
    raise self._exception
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/concurrent/futures/thread.py\", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/_flow_nodes_scheduler.py\", line 111, in _exec_single_node_in_thread
    result = context.invoke_tool(node, f, kwargs=kwargs)
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/_core/flow_execution_context.py\", line 107, in invoke_tool
    self._run_tracker.persist_node_run(run_info)
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/_core/run_tracker.py\", line 388, in persist_node_run
    self._storage.persist_node_run(run_info)
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/_line_execution_process_pool.py\", line 64, in persist_node_run
    self.queue.put(run_info)
  File \"<string>\", line 2, in put
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/multiprocessing/managers.py\", line 809, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/multiprocessing/connection.py\", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File \"/azureml-envs/prompt-flow/runtime/lib/python3.9/multiprocessing/reduction.py\", line 51, in dumps
    cls(buf, protocol).dump(obj)

Locally, the same flow always works.

How To Reproduce the bug

We don't have a minimal reproduction of the issue.

Expected behavior

Run always succeeds.

Running Information(please complete the following information):

Promptflow Package Version promptflow==1.4.1
Operating System: WSL2 (Ubuntu)
Python Version Python 3.9.18

Feb 08 '24 10:02 bastbu

Hi @bastbu,

Thank you for your feedback. Since we are currently on the lunar new year holiday, it may take some time to formally investigate and fix the issue. We expect the investigation to be available after the vacation, which is on February 18th.

According to this link, restarting the compute or changing the session (for automatic runtime case) may help resolve the error. Could you please give it a try and see if the error still occurs?

@thy09 please take a look at this issue when available. After conducting a basic search, I found that this SO answer also encountered the same error. However, there is no known root cause or fix at the moment.

Feb 09 '24 08:02 0mza987

Our investigation suggests that the issue most likely stems from data transfer problems during batch run.

When conducting a flow test, we employ a primary process to execute the flow and transfer all the data to storage without encountering any issues. However, during batch run, to prevent job failure caused by a single line becoming unresponsive, we utilize subprocesses as workers to handle individual lines. These workers then relay the data back to the main process for storage. During this stage, data is transferred using Python's Queue, which involves serialization through pickling.

Yet, certain conditions, such as the presence of a mocked datetime object as observed in this particular issue, render the data non-serializable, leading to errors exclusive to batch execution.

Given that the ultimate goal is to store the data in JSON format, a potential solution is to convert the data to JSON prior to passing it to the main process.

We will then investigate how to do code changes with this solution. Thanks for the feedback.

Feb 22 '24 04:02 thy09

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

Apr 13 '24 21:04 github-actions[bot]

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

May 25 '24 21:05 github-actions[bot]