No vulnerabilityreports for some Namespaces scan pod go to Error immediately after the init
What steps did you take and what happened: We have transferred our static trivy-operator deployment to a new cluster infrastructure.
We are receiving vulnerability reports for some namespaces, but not for others. All namespaces should be checked.
For further testing, we limited ourselves to one of the affected namespaces and saw that the scanner pods displayed errors immediately after PodInit. We did not see any errors in the logs that would help us further.
Strangely enough, we received configaudit reports for all namespaces.
What did you expect to happen: We expect to receive vulnerability reports for all namespaces or an error message in debug mode that will help us further locate the error.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
All images are stored in a separate Harbor repository. The pull request is authenticated at the node level with a service user.
Here are the logs with debug = true from the operator
In the trivy-operator-trivy-config file, we use the following parameters so that the database can be loaded
trivy.httpProxy trivy.httpsProxy
We also use CiliumNetworkPolicy. Even after you deleted everything, the behavior was identical.
Environment:
-
NutanixKubernetsPlatform 2.16
-
Trivy-Operator version: 0.29.0
-
Kubernetes version (use
kubectl version): Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.33.2 -
OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): Node OS: NAME="Rocky Linux" VERSION="9.6 (Blue Onyx)"
We hope you can help us. At the moment, we have no idea what else we could try to find the cause or fix the error.
@mavo86 thanks for the report! could you share which namespaces are skipped? for understanding - im looking at your logs. thanks
Hello,
The log should only contain one of the affected namespaces. I have limited the configuration to one namespace for better analysis.
monitoring-karbon
kubectl get vulnerabilityreports -n monitoring-karbon -o wide is empty, right?
Yes
kubectl get vulnerabilityreports -n monitoring-karbon -o wide No resources found in monitoring-karbon namespace.
If you want, I can extend the configuration back to all namespaces and chacks and upload the new log.
If you want, I can extend the configuration back to all namespaces and chacks and upload the new log.
yes, please. I'd like to reproduce it.
As requested, a log with our standard settings.
Attached, as described, is an output of the configauditreports for this namespace.
kubectl get configauditreports.aquasecurity.github.io --namespace monitoring-karbon -o wide NAME SCANNER AGE CRITICAL HIGH MEDIUM LOW daemonset-cadvisor Trivy 12m 0 1 4 4 daemonset-node-exporter Trivy 12m 0 4 5 6 ingress-ingress-grafana Trivy 12m 0 0 0 0 ingress-ingress-prometheus Trivy 12m 0 0 0 0 replicaset-grafana-96f8cb4fd Trivy 12m 0 1 3 6 replicaset-prometheus-deployment-66df96b85f Trivy 12m 0 1 4 6 service-cadvisor Trivy 12m 0 0 0 0 service-grafana Trivy 12m 0 0 0 0 service-kube-state-metrics Trivy 12m 0 0 0 0 service-node-exporter Trivy 12m 0 0 0 0 service-prometheus-service Trivy 12m 0 0 0 0 statefulset-kube-state-metrics Trivy 12m 0 1 1 5
For example, vulnerability reports were created for the following namespaces:
kube-system metallb-system ntnx-system
@mavo86 thanks for the details, I'll take a look
you check configauditreports, does that mean vulnerabilityreports are also missing?
The error is that vulnerability reports are not generated in all namespaces. I checked the configauditreports to see whether no reports are generated for these namespaces or only the vulnerability reports are missing.
I also took another look at the exposed secret reports. In the namespaces where the vulnerability reports are missing, there are also no exposed secret reports.
However, I cannot judge whether exposed secret reports should be there.
@mavo86 thanks a lot for your assist, I'll take a look and will update you here.
@mavo86 I took a look at the logs again.
at first, there are some known issues here: scan jobs for some images (cilium-agent, mount-bpf-fs, prerequisite-kustomization-wait etc) were brokeb by OOMKilled, it seems you need to increase memory limits to resolve it.
Next error means trivy-operator can't schedule a node collector job. This usually happens when you don’t have sufficient permissions to access managed Kubernetes clusters.
2025-10-16T09:58:43Z ERROR Reconciler error {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"srvhvbgclla0299-md-0-pffgz-s9sqd-wwntt"}, "namespace": "", "name": "srvhvbgclla0299-md-0-pffgz-s9sqd-wwntt", "reconcileID": "3a5f6d0f-c58d-47ba-be4a-aa43a820f34d", "error": "creating job: no compliance commands found"}
However, there are some errors in the job, the cause of which is indeed difficult to determine from the logs. For example: trivy-system/scan-vulnerabilityreport-5b69946b5f.
2025-10-16T09:52:42Z ERROR reconciler.scan job Scan job container {"job": "trivy-system/scan-vulnerabilityreport-5b69946b5f", "container": "node-exporter", "status.reason": "Error", "status.message": ""}
I’d suggest setting a ScanJobTTL and checking the messages there.
At this point, it seems to me that the Trivy Operator might not have access to the container images it’s trying to scan — a situation that’s quite typical for managed Kubernetes clusters.
Could you please verify the permissions?
@afdesk
Thank you for your feedback. I have increased the TTL and added a description and the log. These can be found in the attachment.
I also increased the RAM limit to 1 GB, but if that has nothing to do with the main problem, it's irrelevant for now.
Attached are the RBACs, as far as I remember, these are the defaults from the static deployment of GitHub.
02_trivy-operator_rbac_v2.yaml
I also tried to make login credentials available for our Harbor instance via the parameter OPERATOR_PRIVATE_REGISTRY_SCAN_SECRETS_NAMES:. But I can't say whether they are even trying to use them.
As far as I understand, the format is ‘{“namespace”:“secretsname”}’
OPERATOR_PRIVATE_REGISTRY_SCAN_SECRETS_NAMES: '{“trivy-system”:“nkp-repo-proxy”}'
And here are the logs from the last test run
the scan job couldn't run next command:
$ trivy image hvbg-container-repository.itshessen.hessen.de/github/google/cadvisor:v0.53.0
could you check this image is accessable for Trivy-operator?
my suggestion: the same issue is on EKS for some images: https://github.com/aquasecurity/trivy-operator/issues/2369#issuecomment-2626633611
Yes, I'll check that out, but you'll have to tell me how to do it.
im not familiar with NutanixKubernetsPlatform, but i'd try to run this image (may be via a new job), or use something like ssh / crictl to pull it.
Info:
We have included Nutanix Kubernetes Platform for informational purposes only. Other users may have a similar environment. This is a preconfigured environment with node images and a default configuration for Kubernetes from the manufacturer Nutanix.
I think we can check everything, but we may need to ask how. I think for your “instructions,” it's sufficient to view it from the perspective of a normal Kubernetes cluster.
Perhaps I misunderstood you. I thought I should run a command from the trivy-operator pod to check whether the image is accessible.
If that's the case, what exactly would help you? Roll out the daemon set in the trivy-system namespace and use the trivy-operator service account.
I'm not very familiar with this environment yet and may need to ask for clarification myself.
@mavo86 I need to think over how I can help you more now.
As said @simar7 here
Trivy operator makes no guarantees on managed Kubernetes flavors, so it's quite possible we don't fully support it.
Unfortunately, I'm not able to test it on Nutanix Kubernetes Platform right now
@afdesk: That's okay. It would be great if you could help us with the problem, but I understand that it's not possible to support all types of managed Kubernetes clusters.
Maybe you can help me understand how you saw that it's related to/caused by the following command.
$ trivy image hvbg-container-repository.itshessen.hessen.de/github/google/cadvisor:v0.53.0
As far as I can see from the description of the scan container, this is an argument for starting. Or did you see that in the Trivy-operator logs?
But in the container events it says . .
Normal Created 2m31s kubelet Created container: cadvisor Normal Started 2m31s kubelet Started container cadvisor"
To me, this sounds like the scanner was able to load the image and then an error occurred later.
Is there a parameter for the Trivy-operator to not only enable debugging but also set the log level even deeper?
@mavo86 sorry, missed your message.
how you saw that it's related to/caused by the following command.
the logs contain next, so i decided about permession error.
trivy image hvbg-container-repository.itshessen.hessen.de/github/google/cadvisor:v0.53.0 --cache-dir /tmp/trivy/.cache --format json --image-config-scanners secret --scanners vuln,secret --skip-db-update --slow --list-all-pkgs --output /tmp/scan/result_cadvisor.json 2>/tmp/scan/result_cadvisor.json.log && bzip2 -c /tmp/scan/result_cadvisor.json | base64
State: Terminated
Reason: Error
Exit Code: 1
Is there a parameter for the Trivy-operator to not only enable debugging but also set the log level even deeper?
unfortunately, no, i think we need to add it ASAP - #2725.
@afdesk:
Today, I noticed that in the Harbor instance we use as a repo, there is a difference in the GUI between the images for which we receive reports and the others.
In the /nkp path, we have to push the images so that the Nutanix Kubernetes Platform can provide its own containers. There, I see “artifacts” 1.
The other paths are configured as “proxy cache” and show artifacts 0.
Are you aware of anything we need to consider in Harbor as a “proxy cache” in connection with Trivy Operator?
Are you aware of anything we need to consider in Harbor as a “proxy cache” in connection with Trivy Operator?
Interesting question, I am using an internal unauthenticated harbour on baremetal (non-managed) Kuberentes and I am experiencing the same behaviour described here. At first I thought it was a Storage error but when I solved that error I was left with scan pods just exiting like this with no ram or CPU limits set, no OOMs and the job just exiting after a pull. I'm going to set the TTL and see if it's just exiting right at the image scan comman like this.
Is there any new information on this case?
Is there anything to consider in relation to Harbor as a “proxy cache” and Trivy Operator?
We're currently experiencing the same problem in our RedHat Openshift cluster (Openshift version 4.20.3) that has its nodes as AWS EC2 instances. The Kubernetes server version is v1.33.5. We deployed the trivy-operator in version 0.29.0 via Helm (chart version 0.31.0), operating it in Standalone mode. The scan job pods exit almost immediately after the init container runs. The init container logs show that the action of pulling the vulnerability db was successful. Afterwards, the main containers exit via exit code 1, without any logs.
We see a load of the following error message in the trivy-operator's pod logs:
{"level":"error","ts":"2025-12-16T13:44:36Z","logger":"reconciler.scan job","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-66dd687756","container":"workbench-controller","status.reason":"Error","status.message":"","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport/controller.(*ScanJobController).completedContainers\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller/scanjob.go:441\ngithub.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport/controller.(*ScanJobController).SetupWithManager.(*ScanJobController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller/scanjob.go:103\nsigs.k8s.io/controller-runtime/pkg/reconcile.TypedFunc[...].Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:134\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:461\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:421\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func1.1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:296"}