Overriding Pipeline timeout does not work
Expected Behavior
A PipelineRun, created by a Pipeline should not timeout after 1 hour, but rather after 2h30m.
I used the following configuration:
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: pipeline-name
spec:
tasks:
- name: task-name
taskRef:
kind: Task
name: task-name
timeout: "2h30m0s"
Actual Behavior
A PipelineRun, created by a Pipeline runs out after 1 hour:
PipelineRun "pipeline-name-id" failed to finish within "1h0m0s"
The PipelineRun yaml configuration contains the following:
spec:
pipelineRef:
name: pipeline-name
taskRunTemplate:
serviceAccountName: pipeline
timeouts:
pipeline: 1h0m0s
The pipeline timeout is not overwritten and the pipeline fails.
Steps to Reproduce the Problem
- Create a Pipeline with a task that takes over an hour to complete.
- Set the timeout of the task to more than 1 hour (as shown in the "Expected Behavior" section).
- Run the Pipeline. Check the created PipelineRun config for the timeout limits
Additional Info
I am running a pipeline with 2 tasks, one of which takes longer than an hour. I have specified the timeout limit for this task. However, when a PipelineRun is created by the Pipeline, the default value of 1 hour is not overwritten. I have followed the documentation on how to set a timeout for a pipeline, which seems to have to be done on the task-level.
In the PipelineRun yaml, created by the Pipeline, both:
tasks:
- name: task-name
taskRef:
kind: Task
name: task-name
timeout: 2h30m0s
and
spec:
pipelineRef:
name: pipeline-name
taskRunTemplate:
serviceAccountName: pipeline
timeouts:
pipeline: 1h0m0s
are present. Because the pipeline timeout is shorter than the task timeout, the pipeline will fail after an hour.
I have also tried to set the default timeout value to 2 hours via a ConfigMap. This did not work either:
apiVersion: v1
kind: ConfigMap
metadata:
name: config-defaults
namespace: tekton-pipelines
data:
default-timeout-minutes: "150"
This still resulted in a 1 hour timeout limit.
PipelineRun
Using the following configuration for a PipelineRun does work. In this case the default value of 1 hour is overwritten by 2h40m. However, I would like to not have to create my PipelineRuns manually.
apiVersion: triggers.tekton.dev/v1alpha1
kind: TriggerTemplate
metadata:
name: trigger
spec:
resourcetemplates:
- apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: triggered
spec:
pipelineRef:
name: pipeline-name
timeout: "2h40m0s"
-
Kubernetes version: v1.27.10+28ed2d7 (OpenShift 4.4)
Output of
kubectl version:
Client Version: v1.27.4
Kustomize Version: v5.0.1
Server Version: v1.27.10+28ed2d7
- Tekton Pipeline version: tekton.dev/v1beta1
Using the following configuration for a PipelineRun does work. In this case the default value of 1 hour is overwritten by 2h40m. However, I would like to not have to create my PipelineRuns manually.
How is the PipelineRun created in your setup ? Asking this because, it is most likely, on the "thing" that creates the PipelineRun to set the timeouts correctly.
Tekton Pipeline version: tekton.dev/v1beta1
v1beta1 is the API version, we also need the pipeline instance version (tkn version should display this)
The PipelineRun where the timeout value is correct is created with a Cronjob and EventListener. A PipelineRun created by a Pipeline does not seem to set the timeout value correctly.
I hope that answers your question regarding how the PipelineRun is created.
tkn version:
Client version: 0.33.0 Chains version: v0.19.0 Pipeline version: v0.53.3 Triggers version: v0.25.3 Operator version: v0.69.1
Edit: To add to the first answer: I use the OpenShift UI to start a PipelineRun, which I assume runs something similar to tkn pipeline start pipeline-name
Getting a similar issue on Kubernetes v1.32.0 :roll_eyes:
As of pipelines v0.66.0 and apiVersion: tekton.dev/v1 the behavior has changed a bit.
Now tekton uses 1h0m0s as an hard upper timeout , providing a way to decrease the task timeout, but fails if the task takes more than 1hour to return, eg:
apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
name: default-cached-pipeline
namespace: ci
spec:
description: This pipeline clones a git repo, builds a Docker image with Kaniko
...
- name: build-push-working
runAfter:
- clone
timeout: 0h1m0s <--- this works
taskRef:
name: kaniko
...
- name: build-push-failing
runAfter:
- clone
timeout: 4h0m0s <--- this fails
taskRef:
name: kaniko
...
Client version: 0.39.0
Pipeline version: v0.66.0
Dashboard version: v0.53.0
@vdemeester do you know someone who'd be interested to looking into this issue?
I can look into this @afrittoli @vdemeester
Now tekton uses 1h0m0s as an hard upper timeout , providing a way to decrease the task timeout, but fails if the task takes more than 1hour to return
@tampler In your example, did you attempt to configure the Pipeline Run's timeout as well or were you just setting the Pipeline's Tasks' timeouts? Per the Pipeline Run docs, the Pipeline Run's timeout (defaulting to 1 hour) supersedes the Pipeline's Tasks' timeouts. So if your Pipeline Run does not specify a pipeline timeout I believe the behavior is expected.
@MarijnJV it seems like your issue is: given a Pipeline which has a Task with a timeout longer than the default PipelineRun pipeline-timeout, when you created the PipelineRun via OpenShift's UI it did not override the default Pipeline Run pipeline-timeout (and possibly hard-coded the default of 1h0m0s as the pipeline's timeout?). This lead to your Pipeline Run timing out after 1 hour, and subsequently the Task was cancelled. Is that correct?
@aThorp96 Thanks for looking into this 👀
- I set up global default timeout in
tekton-pipelines/config-defaultsconfig map. Bumped default timeout to 600 mins - I specified the
task timeoutin the pipeline like I showed upper (see quotation)
The thing is that the timeout directive WORKS, however there's a hardcoded upper limit somewhere in your code base which you should move into the global config map.
As I told, decreasing timeout to a lower value (say, 3 mins) works fine. Bumping timeout to 1h5m0s won't work due to your ceiling hardcoded value
apiVersion: tekton.dev/v1 kind: Pipeline metadata: name: default-cached-pipeline namespace: ci spec: description: This pipeline clones a git repo, builds a Docker image with Kaniko ...
- name: build-push-failing runAfter: - clone timeout: 4h0m0s <--- this fails taskRef: name: kaniko ...
Client version: 0.39.0 Pipeline version: v0.66.0 Dashboard version: v0.53.0
@tampler I was unable to reproduce this issue, where the configured timeout-minutes is not applied and not respected, with the following setup:
Client version: 0.39.0
Pipeline version: v0.66.0
Triggers version: v0.30.1
Dashboard version: v0.53.0
Pipeline:
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: pipeline-timeout-test
spec:
tasks:
- name: test-long-timeout
taskSpec:
steps:
- image: quay.io/quay/busybox
script: |
while true; do
echo "$(date) beep"
sleep 1
echo "$(date) boop"
sleep 1
done
timeout: "1h30m0s"
Defaults-config:
apiVersion: v1
data:
default-timeout-minutes: "150"
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: {...}
creationTimestamp: "2025-01-20T19:07:57Z"
labels:
app.kubernetes.io/instance: default
app.kubernetes.io/part-of: tekton-pipelines
name: config-defaults
namespace: tekton-pipelines
resourceVersion: "11503"
uid: b59e9e77-c985-4225-81d8-86047e18de90
After applying the above, when creating a pipeline run via the Tekton Dashboard and also tkn pipeline start, the Pipeline Run was created correctly with the specified default timeout of 150 minutes:
Pipeline Run:
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
creationTimestamp: "2025-01-20T19:56:55Z"
generation: 1
labels:
tekton.dev/pipeline: pipeline-timeout-test
name: test-with-more-than-hour-timeout
namespace: default
resourceVersion: "11717"
uid: 1354c958-58d5-45f2-8a12-2feff8293cdb
spec:
pipelineRef:
name: pipeline-timeout-test
taskRunTemplate:
serviceAccountName: default
timeouts:
pipeline: 2h30m0s # <- note the pipeline-level timeout
status:
childReferences:
- apiVersion: tekton.dev/v1
kind: TaskRun
name: test-with-more-than-hour-timeout-test-long-timeout
pipelineTaskName: test-long-timeout
conditions:
- lastTransitionTime: "2025-01-20T19:56:55Z"
message: 'Tasks Completed: 0 (Failed: 0, Cancelled 0), Incomplete: 1, Skipped:
0'
reason: Running
status: Unknown
type: Succeeded
pipelineSpec:
tasks:
- name: test-long-timeout
taskSpec:
metadata: {}
spec: null
steps:
- computeResources: {}
image: quay.io/quay/busybox
name: ""
script: |
while true; do
echo "$(date) beep"
sleep 1
echo "$(date) boop"
sleep 1
done
timeout: 1h30m0s # <- the task-specific timeout
Similarly if I switched the Task timeout (defined in the Pipeline) to be greater than the Pipeline timeout (defined in the Pipeline Run) then the timelines behaved as expected: the Pipeline timed out before the Task's configured timeout
Two things to note though:
-
@MarijnJV you mention
how to set a timeout for a pipeline, which seems to have to be done on the task-level. I think this is the source of some confusion here. The Pipeline specifies the timeouts for each Task, but the Pipeline Run specifies the timeout for the Pipeline, and the two timeouts are orthogonal. If the Pipeline timeout lapses, the Task timeout is not relevant as all of the Pipeline Run's Task Runs are immediately stopped as "timed-out". This could be clearer in the Pipeline docs you linked, and is more clearly explained in the Pipeline Run docs. I can maybe improve the Pipeline's timeout docs so that this distinction is clearer. -
@tampler Given the above, I was looking into how a PipelineRun may be created and the
default-timeout-minutesnot be applied, and I was unable to do so with valid configuration. What I did notice however was that theconfig-defaultsconfigmap's pre-populated values are not applied. The values in the config-map by default are nested under a_examplekey. So if you just changed the value in the config map from "60" to "150" then it will have no effect. You need to movedefault-timeout-minutesout of_exampleso that it is at.data.default-timeout-minutes. After doing that, all pipeline-runs were created with the 150-minute pipeline-timeout. If you still experience the issue after ensuring the configmap is correct, do you mind manually creating the pipeline usingtkn pipeline start <pipeline name> --pipeline-timeout "2h30m0s"and confirming that the pipeline run still times-out after1h0m0s?
@aThorp96 Thanks for a deep dive into this. Two things to note immediately
- You tested with
apiVersion: tekton.dev/v1beta1, notapiVersion: tekton.dev/v1 - I didn't setup the top level timeout in the task run.
I'll brb to you after applying your example and retesting on my setup. Thanks for your support 🙏
@aThorp96 I have not been able to reproduce this bug as well using your code example and a deadline in both pipeline and pipeline run.
As I see you have already updated docs for this issue. Guess the issue may be closed :v:
Great to hear!
@MarijnJV does the above address your original issue as well?
It has been a while, but I will look into it.
@aThorp96 I am able to reproduce it with the following configuration:
ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: config-defaults
namespace: renovate
labels:
app.kubernetes.io/instance: default
app.kubernetes.io/part-of: tekton-pipelines
data:
default-timeout-minutes: "250" # 4h10m
Pipeline:
apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
name: renovate-full
namespace: renovate
spec:
tasks:
- name: run-renovate
taskRef:
kind: Task
name: run-renovate
timeout: 2h30m0s
This results in the following PipelineRun when I start it from the dashboard:
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: renovate-full-3a43jg
generation: 1
namespace: renovate
finalizers:
- chains.tekton.dev/pipelinerun
labels:
tekton.dev/pipeline: renovate-full
spec:
pipelineRef:
name: renovate-full
taskRunTemplate:
serviceAccountName: pipeline
timeouts:
pipeline: 1h0m0s <-- This is the default timeout, but should be 4h10m (from configmap)
status:
childReferences:
- apiVersion: tekton.dev/v1
kind: TaskRun
name: renovate-full-3a43jg-run-renovate
pipelineTaskName: run-renovate
completionTime: '2025-01-24T10:02:18Z'
conditions:
- lastTransitionTime: '2025-01-24T10:02:18Z'
message: 'Tasks Completed: 1 (Failed: 0, Cancelled 0), Skipped: 0'
reason: Succeeded
status: 'True'
type: Succeeded
pipelineSpec:
tasks:
- name: run-renovate
taskRef:
kind: Task
name: run-renovate
timeout: 2h30m0s <-- task timeout is correct
However, when using tkn pipeline start <pipeline name> --pipeline-timeout "2h30m0s", the timeout is correctly set. So the ConfigMap may be the root of the problem. It could also just be the case that I have not configured it correctly.
Client version: 0.33.0 Chains version: v0.20.1 Pipeline version: v0.59.4 Triggers version: v0.27.0 Operator version: v0.71.0
@MarijnJV do you reproduce by doing tkn pipeline start <pipeline name> ? This could be a cli issue, where cli would use the "default timeout" of 1h from code and not reading the configmap — but looking at the code, it shouldn't be the case 🤔 .
Pipeline version: v0.59.4
The current is v0.66.0 🙄 . @MarijnJV pls bump your versions and try again
I would like to ask in this chat as I tested this behaviour.
I would like to ask if it is possible to set the timeout for whole pipeline by setting it directly in pipeline definitions:
apiVersion: tekton.dev/v1 kind: Pipeline metadata: name: testpipelinetimeout namespace: testns spec: tasks: - name: testpipelinetimeout taskRef: kind: Task name: testpipelinetimeout timeout: 2h0m0s
This definition will set up the timout for tasks for 2 hours. And the pipeline timeout will be taken from default setting from config map and it is 1h. This works as decribed.
Then I can change the default timeout and it works for me also. But I do not want to change it for everybody. Is it possible to change the timeout directly in pipeline? The documentation is lack of this.