trivy-operator icon indicating copy to clipboard operation
trivy-operator copied to clipboard

Policies Bundle not working with internal ecr repo: getting error "failed to load policies","error":"failed to download policies: failed to download built-in policies: download error: OCI artifact must be a single layer

Open chit4 opened this issue 1 year ago • 2 comments

What steps did you take and what happened:

We are trying to upload all the trivy dependency images, like trivy,trivy-db, trivy-java-db and trivy-checks to internal ecr repo for air gaped environment, i have automated it to fetch image from github and upload it to ecr

While rest all images work fine with our ecr with policy bundles repo we are seeing this following error {"level":"error","ts":"2024-08-09T09:18:39Z","logger":"policyLoader.Get misconfig bundle policies","msg":"failed to load policies","error":"failed to download policies: failed to download built-in policies: download error: OCI artifact must be a single layer","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/policy.(*policyLoader).GetPoliciesAndBundlePath\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/loader.go:63\ngithub.com/aquasecurity/trivy-operator/pkg/policy.(*Policies).loadPolicies\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/policy.go:144\ngithub.com/aquasecurity/trivy-operator/pkg/policy.(*Policies).Hash\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/policy.go:114\ngithub.com/aquasecurity/trivy-operator/pkg/configauditreport/controller.(*ResourceController).SetupWithManager.(*ResourceController).reconcileResource.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/configauditreport/controller/resource.go:208\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:113\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"} {"level":"error","ts":"2024-08-09T09:18:40Z","logger":"policyLoader.Get misconfig bundle policies","msg":"failed to load policies","error":"failed to download policies: failed to download built-in policies: download error: OCI artifact must be a single layer","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/policy.(*policyLoader).GetPoliciesAndBundlePath\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/loader.go:63\ngithub.com/aquasecurity/trivy-operator/pkg/policy.(*Policies).loadPolicies\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/policy.go:144\ngithub.com/aquasecurity/trivy-operator/pkg/policy.(*Policies).Eval\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/policy.go:199\ngithub.com/aquasecurity/trivy-operator/pkg/configauditreport/controller.evaluate\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/configauditreport/controller/helper.go:45\ngithub.com/aquasecurity/trivy-operator/pkg/configauditreport/controller.(*ResourceController).SetupWithManager.(*ResourceController).reconcileResource.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/configauditreport/controller/resource.go:229\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:113\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}

What did you expect to happen:

We wanted it to work seamlessly with internal repo ecr

Anything else you would like to add:

It looks like internal ecr repo is not functioning the way how a ghcr works, need help to fix this

Environment:

  • Trivy-Operator version (use trivy-operator version): 0.22.0
  • Kubernetes version (use kubectl version): v1.26.15-eks-db838b0
  • OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): macos

chit4 avatar Aug 09 '24 15:08 chit4

This issue is stale because it has been labeled with inactivity.

github-actions[bot] avatar Oct 09 '24 00:10 github-actions[bot]

How did you manage to even get this far with ECR? Are you using IRSA at all? According to #1874 this is not supported, how did you solve the credential expiration?

gnadaban avatar Oct 15 '24 20:10 gnadaban

we're in the same boat - we have the vulnerability DBs in ECR (per this doc) and that's working fine (care of the operator's service account having an associated IAM role - i.e. via IRSA), but I can't get the operator to use ECR for the checks DB. I'm considering having something populate a cache of the checks and have trivy use that via the custom checks option

badgerspoke avatar Nov 21 '24 06:11 badgerspoke

Hi guys! sorry for long delay.

@chit4 @gnadaban Could you confirm that this issue is still relevant with the latest version (v0.23.0, Helm chart v0.25.0)? thanks!

afdesk avatar Jan 22 '25 12:01 afdesk

I've retested this case in several environments, this feature should work as expected with the latest version (v0.23.0, Helm chart v0.25.0). Please feel free to reopen this issue, if it happens again

afdesk avatar Jan 29 '25 06:01 afdesk

Still an issue with chart v0.25.0:

2025-03-17T02:54:42Z	ERROR	policyLoader.Get misconfig bundle policies	failed to load policies	{"error": "failed to download policies: failed to download built-in policies: download error: oci download error: failed to fetch the layer: GET https://123456789012.dkr.ecr.eu-north-1.amazonaws.com/v2/ghcr_io/aquasecurity/trivy-checks/blobs/sha256:cba49b6781cfcdeb6b063283a711ce0ddb1f36d6e2a5db69ef7d2e3f13998149: DENIED: Your authorization token has expired. Reauthenticate and try again."}
github.com/aquasecurity/trivy-operator/pkg/policy.(*policyLoader).GetPoliciesAndBundlePath

The operator Service Account exists, has suitable IAM policy and annotation to map to it (all other IRSA pieces in place and known working)

ConfigMap trivy-operator contains this for the requisite setting:

policies.bundle.insecure: 'false'
policies.bundle.oci.ref: 123456789012.dkr.ecr.eu-north-1.amazonaws.com/ghcr_io/aquasecurity/trivy-checks:0

badgerspoke avatar Mar 17 '25 03:03 badgerspoke

@badgerspoke thanks for the report could you try with a tag :1: /aquasecurity/trivy-checks:1 instead of /aquasecurity/trivy-checks:0?

afdesk avatar Mar 17 '25 09:03 afdesk

Hey @afdesk - so we only have trivy-checks:0 in our ECR right now - we will mirror the latest 'tag' (it's not clear to me what your cadence for changing those values is TBH). So my point is the manifest/image we're referencing is definitely present but we get the access denied as opposed to a 404 or whatever ECR would return in case of missing

badgerspoke avatar Mar 18 '25 00:03 badgerspoke

Hey @afdesk - so we only have trivy-checks:0 in our ECR right now - we will mirror the latest 'tag' (it's not clear to me what your cadence for changing those values is TBH). So my point is the manifest/image we're referencing is definitely so but we get the access denied as opposed to a 404 or whatever ECR would return in case of missing

@badgerspoke - we rolled over from v0 to v1 over 9 months ago https://github.com/aquasecurity/trivy-checks/pkgs/container/trivy-checks/234575740?tag=0

Regardless, you can track the releases for trivy-checks here: https://github.com/aquasecurity/trivy-checks/pkgs/container/trivy-checks

simar7 avatar Mar 18 '25 01:03 simar7

@badgerspoke I meant could you update trivy-checks with the latest tag - 1? the current Trivy operator depends on the current Trivy, and it needs trivy-checks:1.

cc @simar7, is it right?

afdesk avatar Mar 18 '25 04:03 afdesk

I can and have now mirrored that tag, but that cannot affect the underlying permission denied issue - the pod was requesting a valid image that does exist even if it's technically old - the impact of that would only cause issues with the actual trivy checks themselves.

So I have:

  • IRSA setup and known working (the operator can get the vuln DBs fine)
  • the latest checks DB 1

Is the logic for pulling checks somehow different to the other DBs?

badgerspoke avatar Mar 18 '25 04:03 badgerspoke

There is a small test with Trivy directly:

$ trivy clean --all

$ trivy config  --checks-bundle-repository mirror.gcr.io/aquasec/trivy-checks:0 .
2025-03-18T11:00:51+06:00       INFO    [misconfig] Misconfiguration scanning is enabled
2025-03-18T11:00:51+06:00       INFO    [misconfig] Need to update the built-in checks
2025-03-18T11:00:51+06:00       INFO    [misconfig] Downloading the built-in checks...
2025-03-18T11:00:54+06:00       ERROR   [misconfig] Falling back to embedded checks     err="failed to download built-in policies: download error: OCI repository error: 1 error occurred:\n\t* GET https://mirror.gcr.io/v2/aquasec/trivy-checks/manifests/0: MANIFEST_UNKNOWN: Failed to fetch \"0\"\n\n"

It looks like in your case, checks DB has a manifest file, could you re-check it?

afdesk avatar Mar 18 '25 05:03 afdesk

Oh OK sure I'll retry with 1 and get back to you. Thanks

badgerspoke avatar Mar 18 '25 05:03 badgerspoke

Ok yesterday I mirrored trivy-checks:1 (with oras as we do for the other DBs) and set the operator to use it via the CM via policies.bundle.oci.ref as before; the deployment of this change replaces the operator pod for various organisational reasons so it has a fresh STS token - this is probably key to the issue. All was well initially - scans used it and ran fine - but today I see this in the operator logs:

2025-03-19T02:09:27Z	ERROR	policyLoader.Get misconfig bundle policies	failed to load policies	{"error": "failed to download policies: failed to download built-in policies: download error: oci download error: failed to fetch the layer: GET https://123456789012.dkr.ecr.eu-north-1.amazonaws.com/v2/ghcr_io/aquasecurity/trivy-checks/blobs/sha256:fe9a49f17a4a57ffd584f3a408bfa0d056ddf1b2dcb91005bb4948fecc9def70: DENIED: Your authorization token has expired. Reauthenticate and try again."}
github.com/aquasecurity/trivy-operator/pkg/policy.(*policyLoader).GetPoliciesAndBundlePath
	/home/runner/work/trivy-operator/trivy-operator/pkg/policy/loader.go:65
github.com/aquasecurity/trivy-operator/pkg/configauditreport/controller.(*NodeReconciler).SetupWithManager.(*NodeReconciler).reconcileNodes.func5
	/home/runner/work/trivy-operator/trivy-operator/pkg/configauditreport/controller/node.go:167
sigs.k8s.io/controller-runtime/pkg/reconcile.TypedFunc[...].Reconcile
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:124
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:328
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:288
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:249

badgerspoke avatar Mar 19 '25 02:03 badgerspoke

All was well initially - scans used it and ran fine - but today I see this in the operator logs:

This error comes from Trivy. it looks like your token is expired, and Trivy can't check policy update.

afdesk avatar Mar 24 '25 05:03 afdesk

This log is from the trivy operator pod.

The token will expire, this is normal and expected behaviour for IRSA in AWS. The token is mounted into the pod automatically so is it possible the operator only reads this once, on startup for example?

badgerspoke avatar Mar 24 '25 06:03 badgerspoke

The token will expire, this is normal and expected behaviour for IRSA in AWS. The token is mounted into the pod automatically so is it possible the operator only reads this once, on startup for example?

Trivy has a flag --skip-check-update (skip fetching rego check updates).

but it seems Trivy operator can't pass it... I'll recheck it

afdesk avatar Mar 24 '25 06:03 afdesk

Actually I'm even more confused now - why does the operator need the DBs at all? The scan pods run trivy and they need the DBs or maybe only the server (not operator) needs them?

badgerspoke avatar Mar 24 '25 07:03 badgerspoke

@afdesk can we reopen this please? or is the issue of the operator failing to pull the checks DB sufficiently different to this original problem here to make a new issue?

badgerspoke avatar Apr 16 '25 08:04 badgerspoke

Can I ask what's the verdict/decision here please?

badgerspoke avatar May 07 '25 05:05 badgerspoke

@badgerspoke sorry for long delay.

the verdict/decision is obvious, we should check and fix it. last time we investigated and fixed some performance issues here, but now I hope we'll resolve it asap.

and you're right, we should re-open this issue for a while.

afdesk avatar May 07 '25 05:05 afdesk

Actually I'm even more confused now - why does the operator need the DBs at all? The scan pods run trivy and they need the DBs or maybe only the server (not operator) needs them?

it was made to keep the policies in cache. it allows to decrease policy downloads from open registries.

afdesk avatar May 07 '25 05:05 afdesk

Thanks @afdesk - I didn't mean to hassle you, this project is important to us and we're trying to keep a managed centralised cache of the DBs

badgerspoke avatar May 07 '25 06:05 badgerspoke