pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

Overriding Pipeline timeout does not work

Open MarijnJV opened this issue 1 year ago • 16 comments

Expected Behavior

A PipelineRun, created by a Pipeline should not timeout after 1 hour, but rather after 2h30m.

I used the following configuration:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: pipeline-name
spec:
  tasks:
    - name: task-name
      taskRef:
        kind: Task
        name: task-name
      timeout: "2h30m0s"

Actual Behavior

A PipelineRun, created by a Pipeline runs out after 1 hour:

PipelineRun "pipeline-name-id" failed to finish within "1h0m0s"

The PipelineRun yaml configuration contains the following:

spec:
  pipelineRef:
    name: pipeline-name
  taskRunTemplate:
    serviceAccountName: pipeline
  timeouts:
    pipeline: 1h0m0s

The pipeline timeout is not overwritten and the pipeline fails.

Steps to Reproduce the Problem

  1. Create a Pipeline with a task that takes over an hour to complete.
  2. Set the timeout of the task to more than 1 hour (as shown in the "Expected Behavior" section).
  3. Run the Pipeline. Check the created PipelineRun config for the timeout limits

Additional Info

I am running a pipeline with 2 tasks, one of which takes longer than an hour. I have specified the timeout limit for this task. However, when a PipelineRun is created by the Pipeline, the default value of 1 hour is not overwritten. I have followed the documentation on how to set a timeout for a pipeline, which seems to have to be done on the task-level.

In the PipelineRun yaml, created by the Pipeline, both:

tasks:
      - name: task-name
        taskRef:
          kind: Task
          name: task-name
        timeout: 2h30m0s

and

spec:
  pipelineRef:
    name: pipeline-name
  taskRunTemplate:
    serviceAccountName: pipeline
  timeouts:
    pipeline: 1h0m0s

are present. Because the pipeline timeout is shorter than the task timeout, the pipeline will fail after an hour.

I have also tried to set the default timeout value to 2 hours via a ConfigMap. This did not work either:

apiVersion: v1
kind: ConfigMap
metadata:
  name: config-defaults
  namespace: tekton-pipelines
data:
  default-timeout-minutes: "150"

This still resulted in a 1 hour timeout limit.

PipelineRun

Using the following configuration for a PipelineRun does work. In this case the default value of 1 hour is overwritten by 2h40m. However, I would like to not have to create my PipelineRuns manually.

apiVersion: triggers.tekton.dev/v1alpha1
kind: TriggerTemplate
metadata:
  name: trigger
spec:
  resourcetemplates:
    - apiVersion: tekton.dev/v1beta1
      kind: PipelineRun
      metadata:
        generateName: triggered
      spec:
        pipelineRef:
          name: pipeline-name
        timeout: "2h40m0s"

  • Kubernetes version: v1.27.10+28ed2d7 (OpenShift 4.4)

    Output of kubectl version:

Client Version: v1.27.4
Kustomize Version: v5.0.1
Server Version: v1.27.10+28ed2d7
  • Tekton Pipeline version: tekton.dev/v1beta1

MarijnJV avatar Jul 12 '24 08:07 MarijnJV

Using the following configuration for a PipelineRun does work. In this case the default value of 1 hour is overwritten by 2h40m. However, I would like to not have to create my PipelineRuns manually.

How is the PipelineRun created in your setup ? Asking this because, it is most likely, on the "thing" that creates the PipelineRun to set the timeouts correctly.

Tekton Pipeline version: tekton.dev/v1beta1

v1beta1 is the API version, we also need the pipeline instance version (tkn version should display this)

vdemeester avatar Jul 12 '24 10:07 vdemeester

The PipelineRun where the timeout value is correct is created with a Cronjob and EventListener. A PipelineRun created by a Pipeline does not seem to set the timeout value correctly.

I hope that answers your question regarding how the PipelineRun is created.

tkn version:

Client version: 0.33.0 Chains version: v0.19.0 Pipeline version: v0.53.3 Triggers version: v0.25.3 Operator version: v0.69.1

Edit: To add to the first answer: I use the OpenShift UI to start a PipelineRun, which I assume runs something similar to tkn pipeline start pipeline-name

MarijnJV avatar Jul 15 '24 07:07 MarijnJV

Getting a similar issue on Kubernetes v1.32.0 :roll_eyes:

As of pipelines v0.66.0 and apiVersion: tekton.dev/v1 the behavior has changed a bit.

Now tekton uses 1h0m0s as an hard upper timeout , providing a way to decrease the task timeout, but fails if the task takes more than 1hour to return, eg:

apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
  name: default-cached-pipeline
  namespace: ci
spec:
  description:  This pipeline clones a git repo, builds a Docker image with Kaniko
 ...
- name: build-push-working
      runAfter:
        - clone
      timeout: 0h1m0s <--- this works
      taskRef:
        name: kaniko
...
- name: build-push-failing
      runAfter:
        - clone
      timeout: 4h0m0s <--- this fails
      taskRef:
        name: kaniko
...

Client version: 0.39.0
Pipeline version: v0.66.0
Dashboard version: v0.53.0

tampler avatar Jan 04 '25 09:01 tampler

@vdemeester do you know someone who'd be interested to looking into this issue?

afrittoli avatar Jan 04 '25 11:01 afrittoli

I can look into this @afrittoli @vdemeester

aThorp96 avatar Jan 20 '25 15:01 aThorp96

Now tekton uses 1h0m0s as an hard upper timeout , providing a way to decrease the task timeout, but fails if the task takes more than 1hour to return

@tampler In your example, did you attempt to configure the Pipeline Run's timeout as well or were you just setting the Pipeline's Tasks' timeouts? Per the Pipeline Run docs, the Pipeline Run's timeout (defaulting to 1 hour) supersedes the Pipeline's Tasks' timeouts. So if your Pipeline Run does not specify a pipeline timeout I believe the behavior is expected.

@MarijnJV it seems like your issue is: given a Pipeline which has a Task with a timeout longer than the default PipelineRun pipeline-timeout, when you created the PipelineRun via OpenShift's UI it did not override the default Pipeline Run pipeline-timeout (and possibly hard-coded the default of 1h0m0s as the pipeline's timeout?). This lead to your Pipeline Run timing out after 1 hour, and subsequently the Task was cancelled. Is that correct?

aThorp96 avatar Jan 20 '25 16:01 aThorp96

@aThorp96 Thanks for looking into this 👀

  1. I set up global default timeout in tekton-pipelines/config-defaults config map. Bumped default timeout to 600 mins
  2. I specified the task timeout in the pipeline like I showed upper (see quotation)

The thing is that the timeout directive WORKS, however there's a hardcoded upper limit somewhere in your code base which you should move into the global config map.

As I told, decreasing timeout to a lower value (say, 3 mins) works fine. Bumping timeout to 1h5m0s won't work due to your ceiling hardcoded value

apiVersion: tekton.dev/v1 kind: Pipeline metadata: name: default-cached-pipeline namespace: ci spec: description: This pipeline clones a git repo, builds a Docker image with Kaniko ...

  • name: build-push-failing runAfter: - clone timeout: 4h0m0s <--- this fails taskRef: name: kaniko ...

Client version: 0.39.0 Pipeline version: v0.66.0 Dashboard version: v0.53.0

tampler avatar Jan 20 '25 17:01 tampler

@tampler I was unable to reproduce this issue, where the configured timeout-minutes is not applied and not respected, with the following setup:

Client version: 0.39.0
Pipeline version: v0.66.0
Triggers version: v0.30.1
Dashboard version: v0.53.0

Pipeline:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: pipeline-timeout-test
spec:
  tasks:
    - name: test-long-timeout
      taskSpec:
        steps:
          - image: quay.io/quay/busybox
            script: |
              while true; do
                echo "$(date) beep"
                sleep 1
                echo "$(date) boop"
                sleep 1
              done
      timeout: "1h30m0s"

Defaults-config:

apiVersion: v1
data:
  default-timeout-minutes: "150"
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: {...}
  creationTimestamp: "2025-01-20T19:07:57Z"
  labels:
    app.kubernetes.io/instance: default
    app.kubernetes.io/part-of: tekton-pipelines
  name: config-defaults
  namespace: tekton-pipelines
  resourceVersion: "11503"
  uid: b59e9e77-c985-4225-81d8-86047e18de90

After applying the above, when creating a pipeline run via the Tekton Dashboard and also tkn pipeline start, the Pipeline Run was created correctly with the specified default timeout of 150 minutes: Pipeline Run:

apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  creationTimestamp: "2025-01-20T19:56:55Z"
  generation: 1
  labels:
    tekton.dev/pipeline: pipeline-timeout-test
  name: test-with-more-than-hour-timeout
  namespace: default
  resourceVersion: "11717"
  uid: 1354c958-58d5-45f2-8a12-2feff8293cdb
spec:
  pipelineRef:
    name: pipeline-timeout-test
  taskRunTemplate:
    serviceAccountName: default
  timeouts:
    pipeline: 2h30m0s   # <- note the pipeline-level timeout
status:
  childReferences:
  - apiVersion: tekton.dev/v1
    kind: TaskRun
    name: test-with-more-than-hour-timeout-test-long-timeout
    pipelineTaskName: test-long-timeout
  conditions:
  - lastTransitionTime: "2025-01-20T19:56:55Z"
    message: 'Tasks Completed: 0 (Failed: 0, Cancelled 0), Incomplete: 1, Skipped:
      0'
    reason: Running
    status: Unknown
    type: Succeeded
  pipelineSpec:
    tasks:
    - name: test-long-timeout
      taskSpec:
        metadata: {}
        spec: null
        steps:
        - computeResources: {}
          image: quay.io/quay/busybox
          name: ""
          script: |
            while true; do
              echo "$(date) beep"
              sleep 1
              echo "$(date) boop"
              sleep 1
            done
      timeout: 1h30m0s  # <- the task-specific timeout

Similarly if I switched the Task timeout (defined in the Pipeline) to be greater than the Pipeline timeout (defined in the Pipeline Run) then the timelines behaved as expected: the Pipeline timed out before the Task's configured timeout

Two things to note though:

  • @MarijnJV you mention how to set a timeout for a pipeline, which seems to have to be done on the task-level. I think this is the source of some confusion here. The Pipeline specifies the timeouts for each Task, but the Pipeline Run specifies the timeout for the Pipeline, and the two timeouts are orthogonal. If the Pipeline timeout lapses, the Task timeout is not relevant as all of the Pipeline Run's Task Runs are immediately stopped as "timed-out". This could be clearer in the Pipeline docs you linked, and is more clearly explained in the Pipeline Run docs. I can maybe improve the Pipeline's timeout docs so that this distinction is clearer.

  • @tampler Given the above, I was looking into how a PipelineRun may be created and the default-timeout-minutes not be applied, and I was unable to do so with valid configuration. What I did notice however was that the config-defaults configmap's pre-populated values are not applied. The values in the config-map by default are nested under a _example key. So if you just changed the value in the config map from "60" to "150" then it will have no effect. You need to move default-timeout-minutes out of _example so that it is at .data.default-timeout-minutes. After doing that, all pipeline-runs were created with the 150-minute pipeline-timeout. If you still experience the issue after ensuring the configmap is correct, do you mind manually creating the pipeline using tkn pipeline start <pipeline name> --pipeline-timeout "2h30m0s" and confirming that the pipeline run still times-out after 1h0m0s?

aThorp96 avatar Jan 20 '25 20:01 aThorp96

@aThorp96 Thanks for a deep dive into this. Two things to note immediately

  1. You tested with apiVersion: tekton.dev/v1beta1, not apiVersion: tekton.dev/v1
  2. I didn't setup the top level timeout in the task run.

I'll brb to you after applying your example and retesting on my setup. Thanks for your support 🙏

tampler avatar Jan 20 '25 21:01 tampler

@aThorp96 I have not been able to reproduce this bug as well using your code example and a deadline in both pipeline and pipeline run.

As I see you have already updated docs for this issue. Guess the issue may be closed :v:

tampler avatar Jan 22 '25 20:01 tampler

Great to hear!

@MarijnJV does the above address your original issue as well?

aThorp96 avatar Jan 23 '25 18:01 aThorp96

It has been a while, but I will look into it.

MarijnJV avatar Jan 24 '25 08:01 MarijnJV

@aThorp96 I am able to reproduce it with the following configuration:

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: config-defaults
  namespace: renovate
  labels:
    app.kubernetes.io/instance: default
    app.kubernetes.io/part-of: tekton-pipelines
data:
  default-timeout-minutes: "250" # 4h10m

Pipeline:

apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
  name: renovate-full
  namespace: renovate
spec:
  tasks:
    - name: run-renovate
      taskRef:
        kind: Task
        name: run-renovate
      timeout: 2h30m0s

This results in the following PipelineRun when I start it from the dashboard:

apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  name: renovate-full-3a43jg
  generation: 1
  namespace: renovate
  finalizers:
    - chains.tekton.dev/pipelinerun
  labels:
    tekton.dev/pipeline: renovate-full
spec:
  pipelineRef:
    name: renovate-full
  taskRunTemplate:
    serviceAccountName: pipeline
  timeouts:
    pipeline: 1h0m0s <-- This is the default timeout, but should be 4h10m (from configmap)
status:
  childReferences:
    - apiVersion: tekton.dev/v1
      kind: TaskRun
      name: renovate-full-3a43jg-run-renovate
      pipelineTaskName: run-renovate
  completionTime: '2025-01-24T10:02:18Z'
  conditions:
    - lastTransitionTime: '2025-01-24T10:02:18Z'
      message: 'Tasks Completed: 1 (Failed: 0, Cancelled 0), Skipped: 0'
      reason: Succeeded
      status: 'True'
      type: Succeeded
  pipelineSpec:
    tasks:
      - name: run-renovate
        taskRef:
          kind: Task
          name: run-renovate
        timeout: 2h30m0s <-- task timeout is correct

However, when using tkn pipeline start <pipeline name> --pipeline-timeout "2h30m0s", the timeout is correctly set. So the ConfigMap may be the root of the problem. It could also just be the case that I have not configured it correctly.

Client version: 0.33.0 Chains version: v0.20.1 Pipeline version: v0.59.4 Triggers version: v0.27.0 Operator version: v0.71.0

MarijnJV avatar Jan 24 '25 10:01 MarijnJV

@MarijnJV do you reproduce by doing tkn pipeline start <pipeline name> ? This could be a cli issue, where cli would use the "default timeout" of 1h from code and not reading the configmap — but looking at the code, it shouldn't be the case 🤔 .

vdemeester avatar Jan 24 '25 10:01 vdemeester

Pipeline version: v0.59.4

The current is v0.66.0 🙄 . @MarijnJV pls bump your versions and try again

tampler avatar Jan 24 '25 10:01 tampler

I would like to ask in this chat as I tested this behaviour. I would like to ask if it is possible to set the timeout for whole pipeline by setting it directly in pipeline definitions: apiVersion: tekton.dev/v1 kind: Pipeline metadata: name: testpipelinetimeout namespace: testns spec: tasks: - name: testpipelinetimeout taskRef: kind: Task name: testpipelinetimeout timeout: 2h0m0s This definition will set up the timout for tasks for 2 hours. And the pipeline timeout will be taken from default setting from config map and it is 1h. This works as decribed.

Then I can change the default timeout and it works for me also. But I do not want to change it for everybody. Is it possible to change the timeout directly in pipeline? The documentation is lack of this.

vlskrbek avatar Feb 27 '25 19:02 vlskrbek