Policies Bundle not working with internal ecr repo: getting error "failed to load policies","error":"failed to download policies: failed to download built-in policies: download error: OCI artifact must be a single layer
What steps did you take and what happened:
We are trying to upload all the trivy dependency images, like trivy,trivy-db, trivy-java-db and trivy-checks to internal ecr repo for air gaped environment, i have automated it to fetch image from github and upload it to ecr
While rest all images work fine with our ecr with policy bundles repo we are seeing this following error {"level":"error","ts":"2024-08-09T09:18:39Z","logger":"policyLoader.Get misconfig bundle policies","msg":"failed to load policies","error":"failed to download policies: failed to download built-in policies: download error: OCI artifact must be a single layer","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/policy.(*policyLoader).GetPoliciesAndBundlePath\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/loader.go:63\ngithub.com/aquasecurity/trivy-operator/pkg/policy.(*Policies).loadPolicies\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/policy.go:144\ngithub.com/aquasecurity/trivy-operator/pkg/policy.(*Policies).Hash\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/policy.go:114\ngithub.com/aquasecurity/trivy-operator/pkg/configauditreport/controller.(*ResourceController).SetupWithManager.(*ResourceController).reconcileResource.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/configauditreport/controller/resource.go:208\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:113\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"} {"level":"error","ts":"2024-08-09T09:18:40Z","logger":"policyLoader.Get misconfig bundle policies","msg":"failed to load policies","error":"failed to download policies: failed to download built-in policies: download error: OCI artifact must be a single layer","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/policy.(*policyLoader).GetPoliciesAndBundlePath\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/loader.go:63\ngithub.com/aquasecurity/trivy-operator/pkg/policy.(*Policies).loadPolicies\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/policy.go:144\ngithub.com/aquasecurity/trivy-operator/pkg/policy.(*Policies).Eval\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/policy/policy.go:199\ngithub.com/aquasecurity/trivy-operator/pkg/configauditreport/controller.evaluate\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/configauditreport/controller/helper.go:45\ngithub.com/aquasecurity/trivy-operator/pkg/configauditreport/controller.(*ResourceController).SetupWithManager.(*ResourceController).reconcileResource.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/configauditreport/controller/resource.go:229\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:113\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}
What did you expect to happen:
We wanted it to work seamlessly with internal repo ecr
Anything else you would like to add:
It looks like internal ecr repo is not functioning the way how a ghcr works, need help to fix this
Environment:
- Trivy-Operator version (use
trivy-operator version): 0.22.0 - Kubernetes version (use
kubectl version): v1.26.15-eks-db838b0 - OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): macos
This issue is stale because it has been labeled with inactivity.
How did you manage to even get this far with ECR? Are you using IRSA at all? According to #1874 this is not supported, how did you solve the credential expiration?
we're in the same boat - we have the vulnerability DBs in ECR (per this doc) and that's working fine (care of the operator's service account having an associated IAM role - i.e. via IRSA), but I can't get the operator to use ECR for the checks DB. I'm considering having something populate a cache of the checks and have trivy use that via the custom checks option
Hi guys! sorry for long delay.
@chit4 @gnadaban Could you confirm that this issue is still relevant with the latest version (v0.23.0, Helm chart v0.25.0)? thanks!
I've retested this case in several environments, this feature should work as expected with the latest version (v0.23.0, Helm chart v0.25.0). Please feel free to reopen this issue, if it happens again
Still an issue with chart v0.25.0:
2025-03-17T02:54:42Z ERROR policyLoader.Get misconfig bundle policies failed to load policies {"error": "failed to download policies: failed to download built-in policies: download error: oci download error: failed to fetch the layer: GET https://123456789012.dkr.ecr.eu-north-1.amazonaws.com/v2/ghcr_io/aquasecurity/trivy-checks/blobs/sha256:cba49b6781cfcdeb6b063283a711ce0ddb1f36d6e2a5db69ef7d2e3f13998149: DENIED: Your authorization token has expired. Reauthenticate and try again."}
github.com/aquasecurity/trivy-operator/pkg/policy.(*policyLoader).GetPoliciesAndBundlePath
The operator Service Account exists, has suitable IAM policy and annotation to map to it (all other IRSA pieces in place and known working)
ConfigMap trivy-operator contains this for the requisite setting:
policies.bundle.insecure: 'false'
policies.bundle.oci.ref: 123456789012.dkr.ecr.eu-north-1.amazonaws.com/ghcr_io/aquasecurity/trivy-checks:0
@badgerspoke thanks for the report
could you try with a tag :1: /aquasecurity/trivy-checks:1 instead of /aquasecurity/trivy-checks:0?
Hey @afdesk - so we only have trivy-checks:0 in our ECR right now - we will mirror the latest 'tag' (it's not clear to me what your cadence for changing those values is TBH). So my point is the manifest/image we're referencing is definitely present but we get the access denied as opposed to a 404 or whatever ECR would return in case of missing
Hey @afdesk - so we only have
trivy-checks:0in our ECR right now - we will mirror the latest 'tag' (it's not clear to me what your cadence for changing those values is TBH). So my point is the manifest/image we're referencing is definitely so but we get the access denied as opposed to a 404 or whatever ECR would return in case of missing
@badgerspoke - we rolled over from v0 to v1 over 9 months ago https://github.com/aquasecurity/trivy-checks/pkgs/container/trivy-checks/234575740?tag=0
Regardless, you can track the releases for trivy-checks here: https://github.com/aquasecurity/trivy-checks/pkgs/container/trivy-checks
@badgerspoke I meant could you update trivy-checks with the latest tag - 1?
the current Trivy operator depends on the current Trivy, and it needs trivy-checks:1.
cc @simar7, is it right?
I can and have now mirrored that tag, but that cannot affect the underlying permission denied issue - the pod was requesting a valid image that does exist even if it's technically old - the impact of that would only cause issues with the actual trivy checks themselves.
So I have:
- IRSA setup and known working (the operator can get the vuln DBs fine)
- the latest checks DB
1
Is the logic for pulling checks somehow different to the other DBs?
There is a small test with Trivy directly:
$ trivy clean --all
$ trivy config --checks-bundle-repository mirror.gcr.io/aquasec/trivy-checks:0 .
2025-03-18T11:00:51+06:00 INFO [misconfig] Misconfiguration scanning is enabled
2025-03-18T11:00:51+06:00 INFO [misconfig] Need to update the built-in checks
2025-03-18T11:00:51+06:00 INFO [misconfig] Downloading the built-in checks...
2025-03-18T11:00:54+06:00 ERROR [misconfig] Falling back to embedded checks err="failed to download built-in policies: download error: OCI repository error: 1 error occurred:\n\t* GET https://mirror.gcr.io/v2/aquasec/trivy-checks/manifests/0: MANIFEST_UNKNOWN: Failed to fetch \"0\"\n\n"
It looks like in your case, checks DB has a manifest file, could you re-check it?
Oh OK sure I'll retry with 1 and get back to you. Thanks
Ok yesterday I mirrored trivy-checks:1 (with oras as we do for the other DBs) and set the operator to use it via the CM via policies.bundle.oci.ref as before; the deployment of this change replaces the operator pod for various organisational reasons so it has a fresh STS token - this is probably key to the issue. All was well initially - scans used it and ran fine - but today I see this in the operator logs:
2025-03-19T02:09:27Z ERROR policyLoader.Get misconfig bundle policies failed to load policies {"error": "failed to download policies: failed to download built-in policies: download error: oci download error: failed to fetch the layer: GET https://123456789012.dkr.ecr.eu-north-1.amazonaws.com/v2/ghcr_io/aquasecurity/trivy-checks/blobs/sha256:fe9a49f17a4a57ffd584f3a408bfa0d056ddf1b2dcb91005bb4948fecc9def70: DENIED: Your authorization token has expired. Reauthenticate and try again."}
github.com/aquasecurity/trivy-operator/pkg/policy.(*policyLoader).GetPoliciesAndBundlePath
/home/runner/work/trivy-operator/trivy-operator/pkg/policy/loader.go:65
github.com/aquasecurity/trivy-operator/pkg/configauditreport/controller.(*NodeReconciler).SetupWithManager.(*NodeReconciler).reconcileNodes.func5
/home/runner/work/trivy-operator/trivy-operator/pkg/configauditreport/controller/node.go:167
sigs.k8s.io/controller-runtime/pkg/reconcile.TypedFunc[...].Reconcile
/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:124
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:328
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:288
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:249
All was well initially - scans used it and ran fine - but today I see this in the operator logs:
This error comes from Trivy. it looks like your token is expired, and Trivy can't check policy update.
This log is from the trivy operator pod.
The token will expire, this is normal and expected behaviour for IRSA in AWS. The token is mounted into the pod automatically so is it possible the operator only reads this once, on startup for example?
The token will expire, this is normal and expected behaviour for IRSA in AWS. The token is mounted into the pod automatically so is it possible the operator only reads this once, on startup for example?
Trivy has a flag --skip-check-update (skip fetching rego check updates).
but it seems Trivy operator can't pass it... I'll recheck it
Actually I'm even more confused now - why does the operator need the DBs at all? The scan pods run trivy and they need the DBs or maybe only the server (not operator) needs them?
@afdesk can we reopen this please? or is the issue of the operator failing to pull the checks DB sufficiently different to this original problem here to make a new issue?
Can I ask what's the verdict/decision here please?
@badgerspoke sorry for long delay.
the verdict/decision is obvious, we should check and fix it. last time we investigated and fixed some performance issues here, but now I hope we'll resolve it asap.
and you're right, we should re-open this issue for a while.
Actually I'm even more confused now - why does the operator need the DBs at all? The scan pods run trivy and they need the DBs or maybe only the server (not operator) needs them?
it was made to keep the policies in cache. it allows to decrease policy downloads from open registries.
Thanks @afdesk - I didn't mean to hassle you, this project is important to us and we're trying to keep a managed centralised cache of the DBs