experimental Concurrency limiter controller

Opening this issue to collect ideas, discussion, interest, etc., for a supplemental PipelineRun controller (and possibly TaskRun controller?) that manages Pending PipelineRuns and update them to a Running state to limit execution concurrency.

We've heard a few use cases for limiting execution concurrency, but so far it's been hard to generalize the various needs into one single unified "concurrency" concept that we can apply across all of Tekton Pipelines. Some users might only want to have "deployment" pipeline running at a time, across the whole cluster. Others might want one "deployment" pipeline per namespace, or per deployment target (only one pipeline can deploy to Prod at a time, but you can deploy to Prod and Staging at the same time), or per input source (only deploy my Git repo to one place at a time), or per authorizing user (Alice can only deploy to one place at a time).

Users might also want to limit TaskRun concurrency, either when run as part of a PipelineRun or when executed directly.

We can experiment with supporting these various models and provide a runnable example of limiting concurrency, that users can adapt to their own needs.

As an initial idea, a concurrency controller could be configured with a ConfigMap describing a concurrency key format, and a concurrency limit:

kind: ConfigMap
metadata:
  name: concurrency-controller
data:
  concurrency-key: $(metadata.namespace)-$(spec.pipelineRef.name)
  concurrency-limit: 3

In this example, the key would limit the execution of PipelineRuns referencing the same Pipeline, running in the same namespace, to a max of 3. The concurrency controller would watch for Pending PipelineRuns, derive their keys, count ongoing PipelineRuns with the matching key, and choose to start the new Pending PipelineRun if count < limit. When a PipelineRun finishes, the concurrency controller would reevaluate any Pending runs, and choose one to start if it's under the limit.

(This is just one idea for describing this, if you have something else in mind please contribute it below)

Jan 26 '21 15:01 imjasonh

Here are the issues / PRs / TEPs related to this that I have seen so far:

Big +1 from my pov on making this a component external to Pipelines.

Jan 26 '21 15:01 ghost

Should there be some sort of load-shedding?

Can you queue PipelineRuns for ever? Do they timeout?

Jan 26 '21 15:01 bigkevmcd

cc @jbarrick-mesosphere for his work on the Pending TEP

Jan 26 '21 15:01 imjasonh

Should there be some sort of load-shedding?

Can you queue PipelineRuns for ever? Do they timeout?

Excellent question! This seems like another useful configuration for the limiter, max age before dropping it on the ground.

Users might also want to be able to describe/derive a priority, which would weight a Pending PipelineRun ahead of others in the same concurrency bucket. edit: Along with priority comes preemption -- e.g., a new high-priority Pending PipelineRun should cancel an ongoing run to make room for it.

Ultimately the deliverable here isn't a production-grade maximally configurable controller, just a minimally useful example that operators can potentially modify to their own needs.

Jan 26 '21 15:01 imjasonh

I'm currently facing this issue trying to do "branch preview", i.e. building and deploying each branch on every push to separate urls. Multiple pushes to multiple branches can run in parallel but multiple pushes to a single branch should be processed in order one at a time. I believe this use case would require the concurrency key format to have access to PipelineRun fields so that branch name could be included.

Feb 28 '21 01:02 mjgallag

+1

May 19 '21 11:05 julweber

+1

Jul 01 '21 08:07 eccox

Plusing simultaneous pipelineruns limit.

Would like something as simple as

kind: Pipeline
...
spec:
    runPolicy:
      type: Parallel
      parallel:
          maxLimit: 3

with alternatives as Sequential, and LatestOnly, first executing run requests in natural order, starting next one when previous finishes or cancelled, and last one cancelling any previous runs as the new run is created.

And i expect that this functionality is tekton operator domain, because it would be strange if some external would decide should pipeline operator start processing next run, or should it wait.

I assume pending state is for situations like this.

I refer to openshift operators processing parallelism for BuildConfigs and Builds as straight analogy and good example how it should be done.

https://docs.okd.io/4.7/cicd/builds/advanced-build-operations.html#builds-build-run-policy_advanced-build-operations

https://github.com/openshift/openshift-controller-manager/blob/461fe64e30847a5ae9c361500d7434d2f1756de2/pkg/build/controller/build/build_controller.go#L714

https://github.com/openshift/openshift-controller-manager/blob/461fe64e30847a5ae9c361500d7434d2f1756de2/pkg/build/controller/policy/serial.go

Jul 08 '21 12:07 dbazhal

I suppose run policy is also somehow connected with https://github.com/tektoncd/operator/issues/209

Jul 08 '21 12:07 dbazhal

As an alternative to operator functionality, i can make pipeline runs lock on something with first task of pipeline, and release lock with the finally. But it would break any pipeline run timing metrics as pipelines will start running much longer(including lock release wait time). I'd like pipeline duration numbers contain only "useful" info, showing how long task execution took, but not how long pipeline was waiting for another run to complete.

Jul 08 '21 12:07 dbazhal

This is an important feature I have found and used in most CI systems.

Usually it is not affordable having two pipeline runs running at the same time if they modify a shared resource, such as if they result in api calls to a single instance of a system.

An approach like the one used in GitHub Actions seems an elegant way of implementing this feature: https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#concurrency

Oct 13 '21 22:10 juliaaano

This would be useful for serializing Terraform runs:

If I have several triggers around the same time Terraform should only run one at a time in the order they came in.
An option to cancel intermediate runs - so that any pending runs are cancelled by newer pending runs, but running runs are not cancelled.
The queue should be keyed - it's not so much the Terraform pipeline that needs to be serialized, but Terraform runs of a specific key that need to be serialized.

I was able to implement this in Jenkins but the syntax for it was torturous.

Oct 26 '21 06:10 dmikalova

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

Jan 24 '22 07:01 tekton-robot

/remove-lifecycle stale /lifecycle frozen

Jan 24 '22 10:01 dbazhal

+1

Nov 02 '22 17:11 david972

👍

Mar 21 '23 13:03 shaharb-hs

Hi. I would like to run my databases parallely so that when I give the flyway command, it should happen parallely to all the databases. I dont want the process to happen sequentially.Any idea would be helpful

Jul 11 '23 09:07 AshwinSridharan0410

Any updates on that ? Found that workaround https://holly-k-cummins.medium.com/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297 but this is not native and does not have ordering.

Jul 31 '23 18:07 emirot

With TEP-0135 coscheduling mode it'll delete PVCs when PipelineRuns are finished. Maybe adding a ResourceQuota for number of PVC will therefore limit the number of concurrent PipelineRuns to that limit?

Sep 06 '23 20:09 jimmyjones2

experimental experimental copied to clipboard

Concurrency limiter controller

experimental
experimental copied to clipboard