experimental icon indicating copy to clipboard operation
experimental copied to clipboard

Concurrency limiter controller

Open imjasonh opened this issue 4 years ago • 19 comments

Opening this issue to collect ideas, discussion, interest, etc., for a supplemental PipelineRun controller (and possibly TaskRun controller?) that manages Pending PipelineRuns and update them to a Running state to limit execution concurrency.

We've heard a few use cases for limiting execution concurrency, but so far it's been hard to generalize the various needs into one single unified "concurrency" concept that we can apply across all of Tekton Pipelines. Some users might only want to have "deployment" pipeline running at a time, across the whole cluster. Others might want one "deployment" pipeline per namespace, or per deployment target (only one pipeline can deploy to Prod at a time, but you can deploy to Prod and Staging at the same time), or per input source (only deploy my Git repo to one place at a time), or per authorizing user (Alice can only deploy to one place at a time).

Users might also want to limit TaskRun concurrency, either when run as part of a PipelineRun or when executed directly.

We can experiment with supporting these various models and provide a runnable example of limiting concurrency, that users can adapt to their own needs.


As an initial idea, a concurrency controller could be configured with a ConfigMap describing a concurrency key format, and a concurrency limit:

kind: ConfigMap
metadata:
  name: concurrency-controller
data:
  concurrency-key: $(metadata.namespace)-$(spec.pipelineRef.name)
  concurrency-limit: 3

In this example, the key would limit the execution of PipelineRuns referencing the same Pipeline, running in the same namespace, to a max of 3. The concurrency controller would watch for Pending PipelineRuns, derive their keys, count ongoing PipelineRuns with the matching key, and choose to start the new Pending PipelineRun if count < limit. When a PipelineRun finishes, the concurrency controller would reevaluate any Pending runs, and choose one to start if it's under the limit.

(This is just one idea for describing this, if you have something else in mind please contribute it below)

imjasonh avatar Jan 26 '21 15:01 imjasonh

Here are the issues / PRs / TEPs related to this that I have seen so far:

Big +1 from my pov on making this a component external to Pipelines.

ghost avatar Jan 26 '21 15:01 ghost

Should there be some sort of load-shedding?

Can you queue PipelineRuns for ever? Do they timeout?

bigkevmcd avatar Jan 26 '21 15:01 bigkevmcd

cc @jbarrick-mesosphere for his work on the Pending TEP

imjasonh avatar Jan 26 '21 15:01 imjasonh

Should there be some sort of load-shedding?

Can you queue PipelineRuns for ever? Do they timeout?

Excellent question! This seems like another useful configuration for the limiter, max age before dropping it on the ground.

Users might also want to be able to describe/derive a priority, which would weight a Pending PipelineRun ahead of others in the same concurrency bucket. edit: Along with priority comes preemption -- e.g., a new high-priority Pending PipelineRun should cancel an ongoing run to make room for it.

Ultimately the deliverable here isn't a production-grade maximally configurable controller, just a minimally useful example that operators can potentially modify to their own needs.

imjasonh avatar Jan 26 '21 15:01 imjasonh

I'm currently facing this issue trying to do "branch preview", i.e. building and deploying each branch on every push to separate urls. Multiple pushes to multiple branches can run in parallel but multiple pushes to a single branch should be processed in order one at a time. I believe this use case would require the concurrency key format to have access to PipelineRun fields so that branch name could be included.

mjgallag avatar Feb 28 '21 01:02 mjgallag

+1

julweber avatar May 19 '21 11:05 julweber

+1

eccox avatar Jul 01 '21 08:07 eccox

Plusing simultaneous pipelineruns limit.

Would like something as simple as

kind: Pipeline
...
spec:
    runPolicy:
      type: Parallel
      parallel:
          maxLimit: 3

with alternatives as Sequential, and LatestOnly, first executing run requests in natural order, starting next one when previous finishes or cancelled, and last one cancelling any previous runs as the new run is created.

And i expect that this functionality is tekton operator domain, because it would be strange if some external would decide should pipeline operator start processing next run, or should it wait.

I assume pending state is for situations like this.

I refer to openshift operators processing parallelism for BuildConfigs and Builds as straight analogy and good example how it should be done.

https://docs.okd.io/4.7/cicd/builds/advanced-build-operations.html#builds-build-run-policy_advanced-build-operations

https://github.com/openshift/openshift-controller-manager/blob/461fe64e30847a5ae9c361500d7434d2f1756de2/pkg/build/controller/build/build_controller.go#L714

https://github.com/openshift/openshift-controller-manager/blob/461fe64e30847a5ae9c361500d7434d2f1756de2/pkg/build/controller/policy/serial.go

dbazhal avatar Jul 08 '21 12:07 dbazhal

I suppose run policy is also somehow connected with https://github.com/tektoncd/operator/issues/209

dbazhal avatar Jul 08 '21 12:07 dbazhal

As an alternative to operator functionality, i can make pipeline runs lock on something with first task of pipeline, and release lock with the finally. But it would break any pipeline run timing metrics as pipelines will start running much longer(including lock release wait time). I'd like pipeline duration numbers contain only "useful" info, showing how long task execution took, but not how long pipeline was waiting for another run to complete.

dbazhal avatar Jul 08 '21 12:07 dbazhal

This is an important feature I have found and used in most CI systems.

Usually it is not affordable having two pipeline runs running at the same time if they modify a shared resource, such as if they result in api calls to a single instance of a system.

An approach like the one used in GitHub Actions seems an elegant way of implementing this feature: https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#concurrency

juliaaano avatar Oct 13 '21 22:10 juliaaano

This would be useful for serializing Terraform runs:

  • If I have several triggers around the same time Terraform should only run one at a time in the order they came in.
  • An option to cancel intermediate runs - so that any pending runs are cancelled by newer pending runs, but running runs are not cancelled.
  • The queue should be keyed - it's not so much the Terraform pipeline that needs to be serialized, but Terraform runs of a specific key that need to be serialized.

I was able to implement this in Jenkins but the syntax for it was torturous.

dmikalova avatar Oct 26 '21 06:10 dmikalova

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar Jan 24 '22 07:01 tekton-robot

/remove-lifecycle stale /lifecycle frozen

dbazhal avatar Jan 24 '22 10:01 dbazhal

+1

david972 avatar Nov 02 '22 17:11 david972

👍

shaharb-hs avatar Mar 21 '23 13:03 shaharb-hs

Hi. I would like to run my databases parallely so that when I give the flyway command, it should happen parallely to all the databases. I dont want the process to happen sequentially.Any idea would be helpful

AshwinSridharan0410 avatar Jul 11 '23 09:07 AshwinSridharan0410

Any updates on that ? Found that workaround https://holly-k-cummins.medium.com/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297 but this is not native and does not have ordering.

emirot avatar Jul 31 '23 18:07 emirot

With TEP-0135 coscheduling mode it'll delete PVCs when PipelineRuns are finished. Maybe adding a ResourceQuota for number of PVC will therefore limit the number of concurrent PipelineRuns to that limit?

jimmyjones2 avatar Sep 06 '23 20:09 jimmyjones2