aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

Process is only stored after it fires `ProcessListener.on_process_finished` event resulting in unfinished process in next step

Open agoscinski opened this issue 4 months ago • 0 comments

Bug report from a question on discourse @t-reents https://aiida.discourse.group/t/workchain-continues-before-finishing-the-pervious-step/472 also observed by @superstar54 in the workgraph development

Describe the bug

Already well described here https://aiida.discourse.group/t/workchain-continues-before-finishing-the-pervious-step/472

Here my results from investigating it: The workchain starts the next step before the finished process is stored in the database and thus loads the process before the process state was updated to Finished resulting in the nonzero exit code (is_finished_ok checks if the process state is running).

So on the plumpy side the event is already fired before the process is stored in the database. A simplified backtrace of the process that fires the event (I guess broadcasting) that it finishes, and by that continuing the next process before it updated its process_state: In the aiida.engine.processes.process.Process.on_entered function https://github.com/aiidateam/aiida-core/blob/c7c289d3892bf76894714f53f58b7ce5b0761178/src/aiida/engine/processes/process.py#L422 the parent method is invoked that is in plumpy https://github.com/aiidateam/plumpy/blob/b3837fc9dbf7dc5aca0785e93b94cf5b89d04a91/src/plumpy/processes.py#L701 This invokes much later https://github.com/aiidateam/plumpy/blob/b3837fc9dbf7dc5aca0785e93b94cf5b89d04a91/src/plumpy/processes.py#L837-L840

        self._fire_event(ProcessListener.on_process_finished, self.future().result())

that broadcasts the event to all processes resulting in the next process being continued. The update of the process state to Finished in the database (the object's process state has been updated but not in the database!) happens later in the aiida.engine.processes.process.Process.on_enteredfunction https://github.com/aiidateam/aiida-core/blob/c7c289d3892bf76894714f53f58b7ce5b0761178/src/aiida/engine/processes/process.py#L442

I will try to switch the events and see what it breaks.

Environment

I think it happens on all AiiDA version (tried newest 2.6.2 and 2.2) and all backends, since this looks like a bug in the engine.

Supplementary

Backtrace log of up to the _fire_event backtrace.log

agoscinski avatar Oct 04 '24 06:10 agoscinski