pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

Idea: Pipeline Mutexes

Open wlynch opened this issue 4 years ago • 28 comments

This was an idea that @k floated to me awhile back, but I finally got around to making an issue to discuss. What I'm curious about:

  • Is this a use case we want to focus on?
  • Is this worth make this a built-in feature? (as opposed to a Catalog feature)
  • Any other features / alternatives to consider

Details intentionally vague - this is a "should we do this?" issue, not a "how we'll do this" issue.

Idea

I may want to control how Pipelines run in relation to others and ensure only 1 pipeline for a given selector can run at a time (hence a "mutex"). I may want to reject new Pipelines if one if a similar one is already running, or queue it up and just make sure it does not run in parallel. This might be because:

  • I have a presubmit pipeline that I want to only run one instance at a time per pull request to reduce costs (e.g. in case someone pushes multiple commits. I only need to run the most recent and can cancel the currently running pipelines).
  • My pipeline mutates some external state, and I want to make sure only one thing operates on it at a time.

Possible solution

Have a mechanism to select conditions to allow Pipeline execution, as well as a strategy for what to do in response.

Examples

If a new Pipeline is created that was labelled as a pull request, cancel existing runs.

selector: repo=foo, type=pullrequest
strategy: cancel

Only run 1 pipeline at a time that was labelled as being started by a push to master. (does not guarantee ordering)

selector: repo=foo, type=push, ref=master
strategy: queue

Deny new pipeline create requests if they match a pipeline currently running.

selector: repo=foo, type=push, ref=master
strategy: deny

Alternatives

Implement as a task

  • Cancellation could be handled by having the first step of every pipeline could include something along the lines of

    kubectl delete pipelinerun -l foo=bar
    

    This would clobber over any other Pipelines with a particular label.

  • Queueing could be handled by having a Condition that runs kubectl get for running pods, and only proceed if a condition is true. This is difficult since you'd have to get creative in inspecting runtime information of other runs (e.g. are they also in a wait state, or are they running). This also creates container waste since the pipelines would all be running.

  • Deny could not be implemented this way.

wlynch avatar Jun 17 '20 22:06 wlynch

/kind feature

vdemeester avatar Jun 18 '20 07:06 vdemeester

We've built a queueing system to manage our way around this problem, so a +1 for it being a useful thing to tackle. I don't know whether it should be a core primitive or in the catalog, but in our use case is was necessary at a very early stage, and for deployments specifically it seems to me that having one deployment per app per environment at a time, and ideally in a sensible order, is going to be a very common requirement. So I'm leaning towards 'core'.

tragiclifestories avatar Jun 18 '20 16:06 tragiclifestories

I'd really appreciate this as well, perhaps with Task granularity rather than pipeline. My use case for this, which is to do with cross-talk between concurrent runs of integration tests/db management. In an integration test scenario, for example, the tasks depend on an external resource. If that resource is stateful (like a database), some tasks are rebuilding the database while others might be executing tests which use the database. I'd love to be able to single-thread pipeline runs through the integration test phase.

holly-cummins avatar Jun 22 '20 20:06 holly-cummins

I also think this is a very common use case for a CI/CD Pipeline. Our scenario is that we have one test cluster for all the created PRs. For instance when 2 developers opens 2 PRs, the pipeline should test one PR against the test cluster first and set the status for corresponding PR. Meanwhile all other pipelineRuns for other PRs should be queued until the cluster is free for the next run.

resamaraschi avatar Jun 26 '20 14:06 resamaraschi

I got inspired by @tragiclifestories 's suggestion of a queueing system as a workaround, so I made one too. I documented the steps - hopefully it's useful to someone else while this is pending: https://medium.com/@holly.k.cummins/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297

holly-cummins avatar Jun 30 '20 20:06 holly-cummins

Interesting!

We took a different approach by storing the queue data in configmaps and defining all the queue operations as scripts that run in task steps. So no explicit modelling through CRDs but it works well enough for our use case.

Hopefully we'll get around to the blog-post stage of the project soon.

tragiclifestories avatar Jun 30 '20 22:06 tragiclifestories

I got inspired by @tragiclifestories 's suggestion of a queueing system as a workaround, so I made one too. I documented the steps - hopefully it's useful to someone else while this is pending: https://medium.com/@holly.k.cummins/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297

Nice :) @pritidesai finally in action

afrittoli avatar Jul 01 '20 04:07 afrittoli

Here is the use case we currently have. Imagine this simplified CD pipeline: --> DeployToDev --> TestOnDev --> DeployToPreProd --> TestOnPreProd --> DeployToProd --> SmoketestOnProd There are something like three sections Dev, PreProd and Prod. The pipeline is started for a commit in a Git repository that holds the configuration of an application (GitOps). Now here are some requirements:

  • if a PipelineRun is in TestOnPreProd we don't want another PipelineRun to start DeployToPreProd because the test should be the result of the configuration that the first PipelineRun was started with.
  • but it is ok to have another PipelineRun starting to DeployToDev.
  • a PipelineRun must not take over another PipelineRun because the CD pipeline should deliver the stuff in the order as the configurations were committed to the Git repository.
  • to make things worse - we use the same pipeline to deploy different applications. So the exclusivity should be per section and application.

Currently we use a task in front of the section that polls a REST service to "ask to enter the section". The implementation of the REST service is specific to our pipeline and uses the Tekton API to analyse the state of all the PipelineRuns. It's ugly :blush: but it works so far.

PoliM avatar Aug 11 '20 20:08 PoliM

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar Nov 09 '20 21:11 tekton-robot

I'd prefer both task and pipeline granularity mutex.

Letty5411 avatar Nov 10 '20 06:11 Letty5411

Hi @pritidesai , is there any update about this issue? Thanks :)

Letty5411 avatar Nov 10 '20 07:11 Letty5411

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot avatar Dec 10 '20 08:12 tekton-robot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot avatar Dec 10 '20 09:12 tekton-robot

+1

julweber avatar Apr 21 '21 13:04 julweber

Dear tekton Team,

is there any update about this?

It would be great if there was an option to set pipeline runs to a serial mode. In the scenario that a new pipeline run is added while a pipeline run for the same pipeline within the same namespace is already running, the new run could wait for the existing one to finish before being executed.

For example: Concourse CI allows this via a serial job - https://concourse-ci.org/serial-job-example.html

Kind regards

julweber avatar Apr 21 '21 13:04 julweber

is there any update about this?

@julweber this is being explored in tekton experimental repo: https://github.com/tektoncd/experimental/issues/699 - @ImJasonH shared an idea in that issue

jerop avatar Apr 21 '21 13:04 jerop

Hey @jerop ,

sorry for the late reply. Thanks a lot for the link, i will have a look.

Cheers, Julian

julweber avatar May 19 '21 11:05 julweber

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot avatar Jun 18 '21 11:06 tekton-robot

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Jun 18 '21 11:06 tekton-robot

/reopen

afrittoli avatar Oct 22 '21 12:10 afrittoli

@afrittoli: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Oct 22 '21 12:10 tekton-robot

/remove-lifecycle rotten

afrittoli avatar Oct 22 '21 12:10 afrittoli

Feels like this could be part of a possible solution to the discussion we've been having over at: https://github.com/tektoncd/plumbing/issues/888#issuecomment-943472664

If we want to take this forward I think what will really help is fleshing out the use cases that this feature would solve; @afrittoli this might not be the quite behavior you'd want for some of our common dogfooding use cases (tho it would be better than having a race!):

I may want to reject new Pipelines if one if a similar one is already running, or queue it up and just make sure it does not run in parallel.

For PR triggered PipelineRuns/TaskRuns I think what you often want is to run the newest one and cancel the others (e.g. imagining a PR being updated after kicking off PipelineRuns/TaskRuns)

bobcatfish avatar Oct 25 '21 22:10 bobcatfish

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar Jan 23 '22 22:01 tekton-robot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot avatar Feb 22 '22 22:02 tekton-robot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot avatar Mar 24 '22 23:03 tekton-robot

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Mar 24 '22 23:03 tekton-robot

/lifecycle frozen

lbernick avatar Aug 05 '22 19:08 lbernick

Related to https://github.com/tektoncd/community/pull/716

vdemeester avatar Aug 18 '22 13:08 vdemeester