chains icon indicating copy to clipboard operation
chains copied to clipboard

Chains not pushing signed image to Harbor

Open ChrisJBurns opened this issue 2 years ago • 10 comments

Expected Behavior

  • Kaniko builds image in pipeline
  • Image is pushed to Harbor
  • Chains starts to run and tries to pull the image to sign it
  • Signs image and uploads attestations

Actual Behavior

  • Kaniko builds image in pipeline
  • Image is pushed to Harbor
  • Chains starts to run and tries to pull the image to sign it
  • get UNAUTHORISED error from Harbor

Steps to Reproduce the Problem

  1. Install Harbor v2.5.0
  2. Install Tekton Chains
  3. Run a basic Kaniko task that builds and pushes image to Harbor
  4. Chains runs but fails to pull image

Additional Info

  • Followed documentation https://tekton.dev/docs/chains/signed-provenance-tutorial/

Actual Error recieved in Chains log:

type: 'Warning' reason: 'InternalError' 1 error occurred:\n\t* POST https://[HARBOR_URL]/v2/create/test-project/blobs/uploads/: UNAUTHORIZED: unauthorized to access repository: create/test-project, action: push: unauthorized to access repository: create/test-project, action: push\n\n"

Is worth mentioning, I've tried all sorts, the dockerconfigjson file that is being used by the build-bot SA that the pipeline run uses has the main Harbor admin credential, so I don't know why I'm getting unauthorized. I've also tried to make the repository public, but that doesn't seemed to work either - that is when I get the above error. If it's not public, I get an error describing that it cannot pull the image due to permissions.

  • Kubernetes version:

kubectl version Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.4", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:16:05Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9", GitCommit:"633e1432a197dc4a6a1807311429a9808b99ff5b", GitTreeState:"clean", BuildDate:"2022-03-30T18:27:54Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

(paste your output here)
  • Tekton Pipeline version:

tkn version Client version: 0.23.1 Chains version: v0.8.0 Pipeline version: v0.34.1 Triggers version: v0.19.1

ChrisJBurns avatar Apr 20 '22 22:04 ChrisJBurns

A bit of an update to make things easier - instead of me having to paste all of the pipeline yamls. I have applied the following kaniko task inside the tekton-ci namespace. https://github.com/tektoncd/chains/blob/main/examples/kaniko/kaniko.yaml

And also have a secret in that namespace with the dockerconfigjson secret that contains the Harbor admin creds.

I reference it in the running of the above task like so: tkn task start --param IMAGE=[HARBOR_URL]/create/kaniko-chains --use-param-defaults --workspace name=source,emptyDir="" --workspace name=dockerconfig,secret=flux-harbor-pull-secret kaniko-chains -n tekton-ci

and the logs of the kaniko task container are the following:

 tkn taskrun logs kaniko-chains-run-z5ndh -f -n tekton-ci
[add-dockerfile] FROM alpine@sha256:69e70a79f2d41ab5d637de98c1e0b055206ba40a8145e7bddb55ccc04e13cf8f

[build-and-push] E0420 22:07:11.441029      12 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
[build-and-push] 	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
[build-and-push] error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "[HARBOR_URL]/create/kaniko-chains": POST https://[HARBOR_URL]/v2/create/kaniko-chains/blobs/uploads/: UNAUTHORIZED: unauthorized to access repository: create/kaniko-chains, action: push: unauthorized to access repository: create/kaniko-chains, action: push

[write-url] 2022/04/20 22:07:12 Skipping step because a previous step failed

Now the weird thing is, when I run my pipelines, I get an image being built successfully and pushed to Harbor, but for some reason, the Tekton Chain stuff really isn't working that great.

ChrisJBurns avatar Apr 20 '22 22:04 ChrisJBurns

Ok, so a bit of more debugging, it seems I've made a bit of a mistake somewhere around the service account permissions, namely the difference between the imagePullSecret vs secret of the service account. Will post a better description in the coming comments..

ChrisJBurns avatar Apr 21 '22 11:04 ChrisJBurns

So, after a bit more investiation, it seems that the service account that is attached to the pippelinerun, that ends up performing the kaniko task, in order for Tekton Chains to work, it has to have both imagePullSecrets and secret referencing the .dockerconfigjson secret for Harbor.

Before, it only had the following:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: build-bot
  namespace: tekton-ci
secrets:
  - name: ssh-secret
  - name: flux-harbor-pull-secret

This worked when pushing images to Harbor, but not when it needed to retrieve the image in order to sign it with Cosign. I had to add the imagePullSecret like the below for this to work.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: build-bot
  namespace: tekton-ci
secrets:
  - name: ssh-secret
  - name: flux-harbor-pull-secret
imagePullSecrets:
- name: flux-harbor-pull-secret

It's worth adding this to the documentation here as it doesn't necessarily mention the need for both a secret and a imagePullSecret being attached to the service account. As there's an inherent assumption that it is to be attached to the service account that is attached to the tekton chains controller.

Happy to keep this open for other comments if needed, otherwise am happy to close it.

ChrisJBurns avatar Apr 21 '22 11:04 ChrisJBurns

Hey @ChrisJBurns thanks for debugging this and reporting your findings! I definitely think it would be useful to add this to the documentation. If you or anyone else could open a PR I'm happy to review it.

Until we fix it, maybe we rename this issue "Add documentation for setting up authentication to Harbor"?

priyawadhwa avatar Apr 21 '22 14:04 priyawadhwa

Hi @priyawadhwa sure thing, I can open a PR for it. However, do we think this is a registry wide issue? Or just Harbor? I'm curious what would happen if someone was to use GCR or something else with a service account that only has the secret in it. I'm curious if both secret and imagePullSecret is needed for all registries? Possibly something for someone to try?

Also, I just want to double check that is it right that Kaniko pushes an image to Harbor first (that is unsigned), Chains then takes over and retrieves the image, signs it, and re-pushes it? This means that a push is needed first, then a retrieve & sign, followed by a last push. Is this the standard behaviour across all implementations and registries? Or is this something that is only applicable to Harbor?

ChrisJBurns avatar Apr 21 '22 17:04 ChrisJBurns

I'm having the same issue, have tried the same steps, the one interesting detail I found by enabling harbor debug logging is that the incoming push request to harbor appears to be anonymous- I'm clueless about how tekton locates and consumes the dockerconfig stuff while the task is running and am unsure how I might debug it.

GooseYArd avatar Apr 22 '22 20:04 GooseYArd

Hi. We also tried following the documentation on OpenShift 4.9 and we have the same problem. However, in our case we have tried using quay.io and IBM Cloud Container Registry, so it doesn't seem to be an issue affecting only Harbor. In the usage logs of quay.io I can also see that the image is successfully pushed by Kaniko using the robot account defined in the secret linked to the service account, but when Tekton Chains tries to upload the attestations to the registry, the performing user is labeled as "anonymous" and gets the UNAUTHORIZED error. Here a log extract from the Chains pod:

{"level":"info","ts":"2022-05-20T07:06:36.494Z","logger":"watcher","caller":"oci/oci.go:152","msg":"Starting to upload attestations to OCI ...","commit":"e94c32e","knative.dev/controller":"github.com.tektoncd.chains.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"3f1d4235-a7ea-4163-9d78-524e377007f5","knative.dev/key":"jose-cicd/kaniko-chains-run-4l5v4"} {"level":"info","ts":"2022-05-20T07:06:36.494Z","logger":"watcher","caller":"oci/oci.go:155","msg":"Starting attestation upload to OCI for quay.io/xxx/kaniko-chains@sha256:757aa924eba28cb7b9143d48d1cba2659a846d367a252377953402150b1a24bf...","commit":"e94c32e","knative.dev/controller":"github.com.tektoncd.chains.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"3f1d4235-a7ea-4163-9d78-524e377007f5","knative.dev/key":"jose-cicd/kaniko-chains-run-4l5v4"} {"level":"error","ts":"2022-05-20T07:06:37.905Z","logger":"watcher","caller":"chains/signing.go:210","msg":"POST https://quay.io/v2/xxx/kaniko-chains/blobs/uploads/: UNAUTHORIZED: access to the requested resource is not authorized; map[]","commit":"e94c32e","knative.dev/controller":"github.com.tektoncd.chains.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"3f1d4235-a7ea-4163-9d78-524e377007f5","knative.dev/key":"jose-cicd/kaniko-chains-run-4l5v4","stacktrace":"github.com/tektoncd/chains/pkg/chains.(*TaskRunSigner).SignTaskRun\n\t/opt/app-root/src/go/src/github.com/tektoncd/chains/pkg/chains/signing.go:210\ngithub.com/tektoncd/chains/pkg/reconciler/taskrun.(*Reconciler).FinalizeKind\n\t/opt/app-root/src/go/src/github.com/tektoncd/chains/pkg/reconciler/taskrun/taskrun.go:61\ngithub.com/tektoncd/chains/pkg/reconciler/taskrun.(*Reconciler).ReconcileKind\n\t/opt/app-root/src/go/src/github.com/tektoncd/chains/pkg/reconciler/taskrun/taskrun.go:42\ngithub...

I'm actually curious how @ChrisJBurns managed to get it working, because although we added both secret and imagePullSecret to the service account, the signing still fails. However, OpenShift automatically adds an additional secret and imagePullSecret to the Secret Account for the internal OpenShift Container Registry and I'm not sure if that can be avoided and if that could have something to do with this problem.

@priyawadhwa any suggestions?

Our environment info:

  • OpenShift/Kubernetes:

oc version Client Version: openshift-clients-4.6.0-202006250705.p0-176-g5797eaeca Server Version: 4.9.28 Kubernetes Version: v1.22.5+a36406b

  • tkn version

Client version: 0.16.0 Chains version: v0.8.0 Pipeline version: v0.33.2 Triggers version: v0.19.0

jose-hernandez2 avatar May 20 '22 07:05 jose-hernandez2

@ChrisJBurns,

Also, I just want to double check that is it right that Kaniko pushes an image to Harbor first (that is unsigned), Chains then takes over and retrieves the image, signs it, and re-pushes it? This means that a push is needed first, then a retrieve & sign, followed by a last push. Is this the standard behaviour across all implementations and registries? Or is this something that is only applicable to Harbor?

The image itself is always "unsigned" on the registry. If you have oci storage enabled, Chains will create an image signature and attestation, and push them along side the image in the registry. (This is implemented by cosign). Client tooling, e.g. also cosign, is responsible for making this correlation at consumption time. For example, if I have an image example.com/foo:latest with digest sha256:12345, Chains will create the image signature and attestation, and push them to the registry as oci artifacts. The repo example.com/foo will have, at least, the following tags:

  • latest <- image built by Kaniko with the digest sha256:12345
  • sha256-12345.sig <- image signature
  • sha256-12345.att <- image attestation

The *.sig and *.att tags point to oci artifacts that may or may not be supported by the registry. More specifically, the registry must support the application/vnd.dsse.envelope.v1+json oci artifact MIME type. Docker Hub, Quay.io, and GHCR are ones that do. I wouldn't expect an unauthorized error from a registry that does not support it, but that's ultimately a registry implementation decision. Verify Harbor supports this.

If you're seeing an unauthorized error from the registry in the chains controller logs, check the following:

  • Does the chains controller trust the identity certificate of the registry? This is usually only the case when a self-deployed registry is used, which is probably not the case here.
  • Does the chains controller service account have access to read all the secrets in the cluster? This is done by the default Chains installation methods, but worth checking.
  • Is the taskrun service account linked with the push/pull registry secret? The way build tasks access this secret is not consistent. Some are smart enough to retrieve from linked secrets, while others require mounting the secret directly. On the last case, if the registry secret is not linked to the taskrun service account, the build task will succeed in pushing the image to the registry, but Chains will fail to push the signature and attestation.

I was under the impression that the linkage was done just under secrets, not imagePullSecrets, but from your report that does not seem to be the case. It would be great to further explore this and document accordingly.

lcarva avatar May 31 '22 18:05 lcarva

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar Aug 29 '22 19:08 tekton-robot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot avatar Sep 28 '22 19:09 tekton-robot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot avatar Oct 28 '22 19:10 tekton-robot

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Oct 28 '22 19:10 tekton-robot