pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

Set podTemplate on Tasks to enable multi-arch builds with Matrix

Open dorzel opened this issue 9 months ago • 18 comments

Changes

Hello! @jeffdyoung and I are working towards implementing the feature requested in: https://github.com/tektoncd/pipeline/issues/6742, namely enabling Tekton to do builds on clusters with nodes of multiple architectures by enabling podTemplate to be set on Tasks.

This currently is an extremely basic approach without any of the supporting tests, validation, and other related code changes in order to show a proof of concept and get some feedback on the general approach before creating a TEP. This will of course later also be cleaned up to follow https://github.com/tektoncd/pipeline/blob/main/CONTRIBUTING.md.

These minimal changes do currently work to accomplish the goal, with an example Pipeline like:

kind: PipelineRun
metadata:
  generateName: matrixed-pipelinerun-
spec:
  pipelineSpec:
    tasks:
      - name: build-and-push-manifest
        matrix:
          params:
          - name: arch
            value: 
              - "amd64"
              - "arm64"
        taskSpec:
          results:
            - name: manifest
              type: string
          params:
            - name: arch
          podTemplate:
            nodeSelector:
              kubernetes.io/arch: $(params.arch)
          steps:
            - name: build-and-push
              image: ubuntu
              script: |
                echo "building on $(params.arch)"
                echo "testmanifest-$(params.arch)" | tee $(results.manifest.path)
      - name: create-manifest-list
        params:
          - name: manifest
            value: $(tasks.build-and-push-manifest.results.manifest[*])
        taskSpec:
          steps:
            - name: echo-manifests
              image: ubuntu
              args: ["$(params.manifest[*])"]
              script: echo "$@"

Feedback appreciated, thanks!

Fixes #6742

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • [x] Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • [x] Has Tests included if any functionality added or changed
  • [x] pre-commit Passed
  • [ ] Follows the commit message standard
  • [ ] Meets the Tekton contributor standards (including functionality, content, code)
  • [x] Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • [x] Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • [x] Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Tasks now support specifying `podTemplate` fields. `podTemplate` fields on Tasks support variable substitution and fan-out via Matrix.

dorzel avatar Feb 26 '25 16:02 dorzel

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: dorzel / name: Dylan Orzel (f44cec665562a875ae2a431f0a20c7b715bc48cf, 8ea3bd854824c679c246423e4a9f544d0c1d3b22, c22bec2c4158253f6c7931b38ca37f7b721c0c76)

/kind feature

afrittoli avatar Feb 26 '25 16:02 afrittoli

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 93.3% 92.9% -0.3
pkg/reconciler/taskrun/resources/apply.go 99.4% 98.7% -0.6

tekton-robot avatar Feb 26 '25 17:02 tekton-robot

cc: @aleskandro @Prashanth684 @arewm @adambkaplan

jeffdyoung avatar Feb 26 '25 17:02 jeffdyoung

/assign

waveywaves avatar Apr 07 '25 09:04 waveywaves

@dorzel thank you for opening the draft PR, not sure if a TEP is required as the issue https://github.com/tektoncd/pipeline/issues/6742 mentioned doesn't mention a requirement for one. But I will defer to the rest of the team for views on this.

waveywaves avatar Apr 07 '25 09:04 waveywaves

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 64.7% -34.7

tekton-robot avatar Apr 16 '25 19:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

tekton-robot avatar Apr 22 '25 18:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

tekton-robot avatar Apr 25 '25 18:04 tekton-robot

Ok, I pushed up some updates to this for tests as well as param substitution on all podTemplate fields (at least where it made sense. Bool and Int fields I have left out). Not sure if I covered all the needed areas for testing, let me know.

@chmouel @waveywaves Would you be able to take another look at this?

dorzel avatar Apr 28 '25 18:04 dorzel

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

tekton-robot avatar Apr 28 '25 18:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

tekton-robot avatar May 06 '25 18:05 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

tekton-robot avatar May 12 '25 17:05 tekton-robot

@vdemeester thanks for taking a look! Just so I understand correctly, is the issue here with podTemplate existing at all on Tasks, or just certain parts of it (though presumably, one could specify any field of the podTemplate even if param substitution wasn't supported for it)? There is a preference for keeping these on TaskRun/PipelineRun due to those holding the runtime information?

dorzel avatar May 28 '25 17:05 dorzel

@vdemeester thanks for taking a look! Just so I understand correctly, is the issue here with podTemplate existing at all on Tasks, or just certain parts of it (though presumably, one could specify any field of the podTemplate even if param substitution wasn't supported for it)? There is a preference for keeping these on TaskRun/PipelineRun due to those holding the runtime information?

I agree that, generally speaking, runtime information does not belong with Tasks. However, certain bits of information, like the required architecture, can be inherent to the Task itself, and we should not require users (those who execute the Task) to set it on the runtime every time.

When Tekton runs on Kubernetes, the information about the architecture required by a Task is implemented via the PodTemplate, a Kubernetes-specific, runtime-specific piece of information; hence, a small dilemma arises.

I see two options:

  1. Define a Tekton-specific API to capture a Task target architecture. Right now, this is done through annotations, maybe it's enough? With that, implement a change in the controller that automatically adds the required affinity to Pods.
  2. Allow a subset of the podTemplate ( namely podTemplate/nodeSelector:kubernetes.io/arch) to be set on Tasks

I think option (1) using existing annotations could be relatively easy to implement, although we would have ot deal with some complexity in case:

  • the task annotations includes multiple architectures
  • the podTemplate from the taskrun includes the arch based node selector as well

afrittoli avatar Jun 05 '25 13:06 afrittoli

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

tekton-robot avatar Jun 09 '25 19:06 tekton-robot

I agree that, generally speaking, runtime information does not belong with Tasks. However, certain bits of information, like the required architecture, can be inherent to the Task itself, and we should not require users (those who execute the Task) to set it on the runtime every time.

When Tekton runs on Kubernetes, the information about the architecture required by a Task is implemented via the PodTemplate, a Kubernetes-specific, runtime-specific piece of information; hence, a small dilemma arises.

That's true. It could be inherent to the Task itself, or via the Pipeline I think.

I see two options:

1. Define a Tekton-specific API to capture a Task target architecture. Right now, this is done through annotations, maybe it's enough? With that, implement a change in the controller that automatically adds the required affinity to Pods.

2. Allow a subset of the `podTemplate` ( namely `podTemplate/nodeSelector:kubernetes.io/arch`) to be set on `Tasks`

I think option (1) using existing annotations could be relatively easy to implement, although we would have ot deal with some complexity in case:

* the task annotations includes multiple architectures

* the podTemplate from the taskrun includes the arch based node selector as well

Indeed, there could be cases where the same exact task needs to run for multiple architecture (hence the task itself would not be the "holder" of the architecture information). I even think it's going to be the main use case. @afrittoli

Maybe it is the PodTemplate notion that bothers me, but I agree, we need a Tekton-specific API to capture the Task/PipelineTask target architecture (especially in conjunction with the matrix feature).

I "like" the annotation approach (put an annotation on a TaskRun, and it will be running on a given arch), but today, there is no ways to pass an annotation for a specific Taskrun in a Pipeline (PipelineTask). If we had that, we could use that with matrix to create TaskRun from Pipeline that would have that annotation. The controller could even "warn" (or be configured to be strict and fail) if the arch annotation is not listed in the supported architecture of the task.

vdemeester avatar Jun 10 '25 13:06 vdemeester

Thanks for the input everyone - after some discussion I think the way we want to take this forward is to see if we can use the current functionality of this PR on TaskRunSpecs instead, as mentioned here https://github.com/tektoncd/pipeline/pull/8599#pullrequestreview-2856878743

dorzel avatar Jun 24 '25 15:06 dorzel

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

tekton-robot avatar Jun 24 '25 17:06 tekton-robot

I drafted a related proposal in Shipwright on creating an API that is specific for running multi-arch builds - SHIP-0043. Linking here mainly to help us on the Shipwright side "keep track" of these capabilities on the Tekton side. I see the matrix support here (whether at the Pipeline or TaskRunSpec level) complementing, rather than competing with, the work that is described for Shipwright.

adambkaplan avatar Jun 30 '25 18:06 adambkaplan

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

tekton-robot avatar Jul 08 '25 20:07 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 86.0% -13.3
pkg/reconciler/taskrun/taskrun.go 85.9% 86.1% 0.3

tekton-robot avatar Jul 09 '25 15:07 tekton-robot

Ok, I've pushed up the changes for using TaskRunSpecs instead. Ready for review on that.

@vdemeester for the -* on pipelineTaskName in https://github.com/tektoncd/pipeline/pull/8599#issuecomment-2959292473, I'm curious as to what your vision for that was? Is it just a glob pattern in order to match multiple task names? I had a rough implementation but left it out for now since as things are right now, this works for fanning out while only writing a single task.

dorzel avatar Jul 09 '25 16:07 dorzel

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.2% -0.2
pkg/reconciler/taskrun/taskrun.go 85.9% 86.1% 0.3

tekton-robot avatar Jul 09 '25 16:07 tekton-robot

/retest

afrittoli avatar Jul 18 '25 12:07 afrittoli

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.2% -0.2
pkg/reconciler/taskrun/taskrun.go 85.9% 86.1% 0.3

tekton-robot avatar Jul 18 '25 12:07 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.2% -0.2
pkg/reconciler/taskrun/taskrun.go 85.9% 86.1% 0.3

tekton-robot avatar Jul 18 '25 17:07 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.2% -0.2
pkg/reconciler/taskrun/taskrun.go 85.9% 86.1% 0.3

tekton-robot avatar Jul 18 '25 17:07 tekton-robot

e2e tests passed except for an error with my example pipeline where it (assumedly) needed to be in the beta folder. Waiting on retest to confirm that worked

dorzel avatar Jul 23 '25 16:07 dorzel

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.2% -0.2
pkg/reconciler/taskrun/taskrun.go 85.9% 86.1% 0.3

tekton-robot avatar Jul 23 '25 16:07 tekton-robot