argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

proposal: workflow concurrency control

Open Joibel opened this issue 1 year ago • 1 comments

Current state

Users have no built-in way of limiting simultaneous runs of identical workflows if they wish to do so.

This is because mutexes & semaphores in Argo only prevent multiple N concurrent runs, but do not limit at the workflow definition level. If they did work it would only work simply in the event that the user would like to delay execution (a queue) rather than cancel one of the runs.

User story

As an Argo user I want the ability to enable a limit for the number of concurrent runs of an identical workflow definition so I can achieve any of the following benefits:

  • Prevent data duplication/ data conflicts
  • Conserve compute resources

Examples:

  • Spark streaming job => FIFO use case: We want only 1 run of it at any given point, preferencing the existing run. We want to prevent someone else from triggering the job. If another user tries to run the same job, we want it to fail to run.
  • CI job => LIFO use case: We want only 1 run of the CI pipeline at any given point, preferencing the new run (often triggered by a new Git event, i.e. commit). We want to stop the existing run if it is still in progress, since it is now outdated, and let the new submitted workflow run.

Proposal

CronWorkflows already have concurrencyPolicy matching that of a native CronJob.

We would add a concurrencyPolicy field in the spec of a Workflow (it would also work if specified int a WorkflowTemplate spec consumed via workflowTemplateRef):

Valid policies are Allow, Forbid and Replace. These are the same concurrencyPolicies as available in CronWorkflow/CronJob.

  • Allow: Would do the current behavior and is the default
  • Forbid: Would prevent a new workflow from running, and a new workflow would be stopped.
  • Replace: Would replace an older workflow with the new one, which would terminate the existing workflow.

As there is no way of knowing the other workflows to group against the optional field concurrencyMatchLabels. Without this field the concurrencyPolicy would do nothing because no other workflows would be in the group.

concurrencyPolicy: Forbid
concurrencyMatchLabels:
  workflows.argoproj.io/cluster-workflow-template: my-ci

The match block is like many other systems and would allow label matching to find other workflows for which this concurrency policy applies.

This would silently do nothing for WorkflowTemplates which are not consumed via workflowTemplateRef in the same way as other similar fields like volumes do.

Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritize the proposals with the most 👍.

Joibel avatar Mar 07 '24 10:03 Joibel

#13055 essentially proposed this too, but on a semaphore or mutex, which would obviate the need for selectors (although they are a k8s standard and quite powerful, they're usually used for resources that depend on others).

While that wouldn't handle the case of parallelism, but semaphores are more-or-less a superset of parallelism and mutexes, so I think that could be fine.

concurrencyPolicy on semaphore or mutex would be more straightforward to implement and only adds a single field to the spec, one that is already supported by CronWorkflows and CronJobs

agilgur5 avatar May 15 '24 16:05 agilgur5

I'm interested in the "replace" behavior, but it would need to be able to have some key to refer to what should be replaced. Perhaps this could be done with labels, but ideally it would be able to be based on input parameters.

Currently accomplishing this with GitHub Actions using the following configuration:

concurrency:
  group: ${{ inputs.ref }}
  cancel-in-progress: true

This limits to 1 execution per ref, and when a new one is submitted the in-progress one is canceled.

reilly3000 avatar Oct 07 '24 21:10 reilly3000

I stumbled upon @LinuxSuRen's argo-workflow-atomic-plugin today which looks like a user-land workaround for the Replace policy

agilgur5 avatar Oct 26 '24 04:10 agilgur5