pipeline
pipeline copied to clipboard
Idea: Pipeline Mutexes
This was an idea that @k floated to me awhile back, but I finally got around to making an issue to discuss. What I'm curious about:
- Is this a use case we want to focus on?
- Is this worth make this a built-in feature? (as opposed to a Catalog feature)
- Any other features / alternatives to consider
Details intentionally vague - this is a "should we do this?" issue, not a "how we'll do this" issue.
Idea
I may want to control how Pipelines run in relation to others and ensure only 1 pipeline for a given selector can run at a time (hence a "mutex"). I may want to reject new Pipelines if one if a similar one is already running, or queue it up and just make sure it does not run in parallel. This might be because:
- I have a presubmit pipeline that I want to only run one instance at a time per pull request to reduce costs (e.g. in case someone pushes multiple commits. I only need to run the most recent and can cancel the currently running pipelines).
- My pipeline mutates some external state, and I want to make sure only one thing operates on it at a time.
Possible solution
Have a mechanism to select conditions to allow Pipeline execution, as well as a strategy for what to do in response.
Examples
If a new Pipeline is created that was labelled as a pull request, cancel existing runs.
selector: repo=foo, type=pullrequest
strategy: cancel
Only run 1 pipeline at a time that was labelled as being started by a push to master. (does not guarantee ordering)
selector: repo=foo, type=push, ref=master
strategy: queue
Deny new pipeline create requests if they match a pipeline currently running.
selector: repo=foo, type=push, ref=master
strategy: deny
Alternatives
Implement as a task
-
Cancellation could be handled by having the first step of every pipeline could include something along the lines of
kubectl delete pipelinerun -l foo=bar
This would clobber over any other Pipelines with a particular label.
-
Queueing could be handled by having a Condition that runs
kubectl get
for running pods, and only proceed if a condition is true. This is difficult since you'd have to get creative in inspecting runtime information of other runs (e.g. are they also in a wait state, or are they running). This also creates container waste since the pipelines would all be running. -
Deny could not be implemented this way.
/kind feature
We've built a queueing system to manage our way around this problem, so a +1 for it being a useful thing to tackle. I don't know whether it should be a core primitive or in the catalog, but in our use case is was necessary at a very early stage, and for deployments specifically it seems to me that having one deployment per app per environment at a time, and ideally in a sensible order, is going to be a very common requirement. So I'm leaning towards 'core'.
I'd really appreciate this as well, perhaps with Task granularity rather than pipeline. My use case for this, which is to do with cross-talk between concurrent runs of integration tests/db management. In an integration test scenario, for example, the tasks depend on an external resource. If that resource is stateful (like a database), some tasks are rebuilding the database while others might be executing tests which use the database. I'd love to be able to single-thread pipeline runs through the integration test phase.
I also think this is a very common use case for a CI/CD Pipeline. Our scenario is that we have one test cluster for all the created PRs. For instance when 2 developers opens 2 PRs, the pipeline should test one PR against the test cluster first and set the status for corresponding PR. Meanwhile all other pipelineRuns for other PRs should be queued until the cluster is free for the next run.
I got inspired by @tragiclifestories 's suggestion of a queueing system as a workaround, so I made one too. I documented the steps - hopefully it's useful to someone else while this is pending: https://medium.com/@holly.k.cummins/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297
Interesting!
We took a different approach by storing the queue data in configmaps and defining all the queue operations as scripts that run in task steps. So no explicit modelling through CRDs but it works well enough for our use case.
Hopefully we'll get around to the blog-post stage of the project soon.
I got inspired by @tragiclifestories 's suggestion of a queueing system as a workaround, so I made one too. I documented the steps - hopefully it's useful to someone else while this is pending: https://medium.com/@holly.k.cummins/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297
Nice :) @pritidesai finally in action
Here is the use case we currently have. Imagine this simplified CD pipeline:
--> DeployToDev --> TestOnDev --> DeployToPreProd --> TestOnPreProd --> DeployToProd --> SmoketestOnProd
There are something like three sections Dev, PreProd and Prod.
The pipeline is started for a commit in a Git repository that holds the configuration of an application (GitOps). Now here are some requirements:
- if a PipelineRun is in
TestOnPreProd
we don't want another PipelineRun to startDeployToPreProd
because the test should be the result of the configuration that the first PipelineRun was started with. - but it is ok to have another PipelineRun starting to
DeployToDev
. - a PipelineRun must not take over another PipelineRun because the CD pipeline should deliver the stuff in the order as the configurations were committed to the Git repository.
- to make things worse - we use the same pipeline to deploy different applications. So the exclusivity should be per section and application.
Currently we use a task in front of the section that polls a REST service to "ask to enter the section". The implementation of the REST service is specific to our pipeline and uses the Tekton API to analyse the state of all the PipelineRuns. It's ugly :blush: but it works so far.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Send feedback to tektoncd/plumbing.
I'd prefer both task and pipeline granularity mutex.
Hi @pritidesai , is there any update about this issue? Thanks :)
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
+1
Dear tekton Team,
is there any update about this?
It would be great if there was an option to set pipeline runs to a serial mode. In the scenario that a new pipeline run is added while a pipeline run for the same pipeline within the same namespace is already running, the new run could wait for the existing one to finish before being executed.
For example: Concourse CI allows this via a serial job - https://concourse-ci.org/serial-job-example.html
Kind regards
is there any update about this?
@julweber this is being explored in tekton experimental repo: https://github.com/tektoncd/experimental/issues/699 - @ImJasonH shared an idea in that issue
Hey @jerop ,
sorry for the late reply. Thanks a lot for the link, i will have a look.
Cheers, Julian
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
with a justification.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/close
Send feedback to tektoncd/plumbing.
@tekton-robot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen
with a justification. Mark the issue as fresh with/remove-lifecycle rotten
with a justification. If this issue should be exempted, mark the issue as frozen with/lifecycle frozen
with a justification./close
Send feedback to tektoncd/plumbing.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
@afrittoli: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-lifecycle rotten
Feels like this could be part of a possible solution to the discussion we've been having over at: https://github.com/tektoncd/plumbing/issues/888#issuecomment-943472664
If we want to take this forward I think what will really help is fleshing out the use cases that this feature would solve; @afrittoli this might not be the quite behavior you'd want for some of our common dogfooding use cases (tho it would be better than having a race!):
I may want to reject new Pipelines if one if a similar one is already running, or queue it up and just make sure it does not run in parallel.
For PR triggered PipelineRuns/TaskRuns I think what you often want is to run the newest one and cancel the others (e.g. imagining a PR being updated after kicking off PipelineRuns/TaskRuns)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
with a justification.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/close
Send feedback to tektoncd/plumbing.
@tekton-robot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen
with a justification. Mark the issue as fresh with/remove-lifecycle rotten
with a justification. If this issue should be exempted, mark the issue as frozen with/lifecycle frozen
with a justification./close
Send feedback to tektoncd/plumbing.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/lifecycle frozen
Related to https://github.com/tektoncd/community/pull/716