aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

`BaseRestartWorkChain`: pause at `max_iterations`

Open mbercx opened this issue 1 month ago • 7 comments

From @mikibonacci and @giovannipizzi in https://github.com/aiidateam/aiida-core/pull/7069#issuecomment-3501267477

With @giovannipizzi, we were discussing of adding the pause feature also for handled errors that reached the maximum of the allowed restarts (max_iterations). This because it can happen that we just hit several times the same exit code for the CalcJobs, but actually the handling is not effective (for whatever reason) and so a pause can help and let the user decide how to proceed. For example, I was running a muon calculation, and the accuracy of the scf was never decreasing (due to a sort of issue in QE). The walltime was not enough and the WorkChain restarted 5 times without solving the issue - the walltime was not the real issue.

Something like on_handled_but_exceeded_failure?

mbercx avatar Nov 07 '25 20:11 mbercx

@sNakiex What DB provider are you using? EF Core, MongoDB, something else?

sfmskywalker avatar Nov 12 '25 07:11 sfmskywalker

@sfmskywalker I am using EF Core

sNakiex avatar Nov 12 '25 12:11 sNakiex

But I think the issue isn't inherently connected to the DB provider used but due to the way the ActivityRegistery is currently implemented.

Tenant 1 fetches the available activities The ActivityRegistry loads in all available activities for the given tenant based on the providers.

Tenant 2 fetches the available activities. The ActivityRegistry loads in all available activities, but it also retains the activities that were loaded in previously.

The result is that Tenant 2 will see the published workflows that are marked as UsableAsActivity from Tenant 1.

This is due to the currently available activitydescriptors being given the already existing ConcurrentDictionary

Method: RefreshDescriptorsAsync var activityDescriptors = new ConcurrentDictionary<(string Type, int Version), ActivityDescriptor>(_activityDescriptors);

sNakiex avatar Nov 13 '25 10:11 sNakiex