pipeline
pipeline copied to clipboard
Changes to images published by Pipelines are not tested in the PR where they are introduced
Pipelines builds and publishes several images: git-init, pullrequest-init, kubeconfigwriter, and imagedigestexporter. These images are considered part of the Pipelines API.
Some of these images are used in the examples tests, but the examples tests pull versions of these images that have already been published. This means that changes to these images that cause breaking changes aren't detected in the PRs in which they are introduced; for example, #4758 introduced a breaking change to the git-init image, causing examples tests to break after this image was published (#4711).
There are two potential ways we could address this problem:
- Modify our examples tests to build and run these images based on the contents of the PR, instead of the version of the image already published.
- As suggested in TEP-0074, create a new repo for the images we're publishing. I'm not sure if this would be appropriate for all of the images, or just the ones used in PipelineResources.
(1) sounds generally better to me, but it makes the examples a bit less useful since you can't copy/paste them without doing the build yourself.
I'm not sure (2) solves the issue of testing them at the PR's state, since we'd still have to build/publish them during PR testing.
Maybe there's an option (3) (or 1.5) that's to have an example for git-init etc that uses the newly-built-from-PR code, but isn't in examples. Then the git-init example can remain as our test that the PR doesn't revert behavior depended on by the example (as it did in #4758).
I think there was an issue at some point around examples versus tests.
If we are using examples as a set of e2e test payload, we should definitely make sure we use ko and ko:// to exercise the images we own in the PR. We should even have a check that errors out if it find git-init or other images we own without ko.
The trick is that, usually examples are seen for end-user as an example of what the can write and run. And usually, this means they will try to do kubectl apply -f examples/…, which will break for the examples that are using our images.
I would suggest we split the examples into 2 buckets:
- one under
tests/yaml(or something) that usesko://and exercise the code that changed in the PR - one under
examples/*that is as today
Both can still run as part of our e2e tests, but it would be clear for the users which one are examples and which one are payloads for tests.
So git-init is tested in https://github.com/tektoncd/pipeline/blob/2b8538214169ec1c3af14221f44ea59e6b1b81f7/test/yamls/v1beta1/pipelineruns/pipelinerun.yaml#L83 - and it was updated in #4758 with the runAsUser: 0 change. So that issue slipped through not because it wasn't being tested but because, well, it just slipped through after getting worked around in one place.
I would definitely lean towards moving more stuff under test/yaml - in some cases, that'll result in duplication, like between test/yaml/v1beta1/pipelineruns/pipelinerun.yaml and examples/v1beta1/pipelineruns/pipelinerun.yaml, but I don't think that's a bad thing, since we do want to make sure that all of our user-facing examples actually pass too. That would also ideally include testing that examples using catalog tasks directly (rather than the copied-in-old-version-of-a-catalog-task in https://github.com/tektoncd/pipeline/blob/2b8538214169ec1c3af14221f44ea59e6b1b81f7/examples/v1beta1/pipelineruns/pipelinerun.yaml#L39-L41), but that would mean getting OCI bundles on-by-default.
Whoops, thanks Andrew, looks like I was wrong about the git-init image! I just did a quick search and it looks like while the git-init and imagedigestexporter images have coverage in examples, pullrequest-init and kubeconfigwriter don't.
(There are also a few other images we publish: workingdirinit, entrypoint, nop, etc. If entrypoint broke, integration tests would be unlikely to work, and nop seems hard to break, so I'm wondering if we need a different approach for each of the images?)
Maybe we can add test coverage for all of our images in test/yamls, keep examples/yamls in place as documentation, and update our PR standards to indicate that breaking changes to the images should be discussed beforehand? perhaps as part of #4654-- not suggesting that this would need a TEP but just that we should avoid making breaking changes to the images unless it's necessary
I don't think we want to stop running everything in examples, since we should definitely be sure that any of the examples we're showing to users actually work, but adding test/yamls/... test coverage for the various images is a fantastic idea. We might also want to split out the execution of TestYamls and TestExamples, so that the test/yamls ones show up more clearly as a separate thing - like, run TestYamls before TestExamples and fail the whole build if TestYamls fails, rather than any TestYamls and/or TestExamples failures all getting lumped together in the log. It might even make sense to rename TestYamls to TestImages or something like that to more clearly emphasize what the purpose of the test is...
@lbernick @abayer yes, I agree with those suggestions 👼🏼
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/close
Send feedback to tektoncd/plumbing.
@tekton-robot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopenwith a justification. Mark the issue as fresh with/remove-lifecycle rottenwith a justification. If this issue should be exempted, mark the issue as frozen with/lifecycle frozenwith a justification./close
Send feedback to tektoncd/plumbing.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.