trivy-operator No vulnerabilityreports for some Namespaces scan pod go to Error immediately after the init

What steps did you take and what happened: We have transferred our static trivy-operator deployment to a new cluster infrastructure.

We are receiving vulnerability reports for some namespaces, but not for others. All namespaces should be checked.

For further testing, we limited ourselves to one of the affected namespaces and saw that the scanner pods displayed errors immediately after PodInit. We did not see any errors in the logs that would help us further.

Strangely enough, we received configaudit reports for all namespaces.

What did you expect to happen: We expect to receive vulnerability reports for all namespaces or an error message in debug mode that will help us further locate the error.

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

All images are stored in a separate Harbor repository. The pull request is authenticated at the node level with a service user.

Here are the logs with debug = true from the operator

Trivyloh_151025_v9.log

In the trivy-operator-trivy-config file, we use the following parameters so that the database can be loaded

trivy.httpProxy trivy.httpsProxy

We also use CiliumNetworkPolicy. Even after you deleted everything, the behavior was identical.

Environment:

NutanixKubernetsPlatform 2.16
Trivy-Operator version: 0.29.0
Kubernetes version (use kubectl version): Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.33.2
OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): Node OS: NAME="Rocky Linux" VERSION="9.6 (Blue Onyx)"

We hope you can help us. At the moment, we have no idea what else we could try to find the cause or fix the error.

Oct 15 '25 14:10 mavo86

@mavo86 thanks for the report! could you share which namespaces are skipped? for understanding - im looking at your logs. thanks

Oct 16 '25 07:10 afdesk

Hello,

The log should only contain one of the affected namespaces. I have limited the configuration to one namespace for better analysis.

monitoring-karbon

Oct 16 '25 07:10 mavo86

kubectl get vulnerabilityreports -n monitoring-karbon -o wide is empty, right?

Oct 16 '25 08:10 afdesk

Yes

kubectl get vulnerabilityreports -n monitoring-karbon -o wide No resources found in monitoring-karbon namespace.

If you want, I can extend the configuration back to all namespaces and chacks and upload the new log.

Oct 16 '25 09:10 mavo86

If you want, I can extend the configuration back to all namespaces and chacks and upload the new log.

yes, please. I'd like to reproduce it.

Oct 16 '25 09:10 afdesk

As requested, a log with our standard settings.

Attached, as described, is an output of the configauditreports for this namespace.

kubectl get configauditreports.aquasecurity.github.io --namespace monitoring-karbon -o wide NAME SCANNER AGE CRITICAL HIGH MEDIUM LOW daemonset-cadvisor Trivy 12m 0 1 4 4 daemonset-node-exporter Trivy 12m 0 4 5 6 ingress-ingress-grafana Trivy 12m 0 0 0 0 ingress-ingress-prometheus Trivy 12m 0 0 0 0 replicaset-grafana-96f8cb4fd Trivy 12m 0 1 3 6 replicaset-prometheus-deployment-66df96b85f Trivy 12m 0 1 4 6 service-cadvisor Trivy 12m 0 0 0 0 service-grafana Trivy 12m 0 0 0 0 service-kube-state-metrics Trivy 12m 0 0 0 0 service-node-exporter Trivy 12m 0 0 0 0 service-prometheus-service Trivy 12m 0 0 0 0 statefulset-kube-state-metrics Trivy 12m 0 1 1 5

Trvylog_161025_v1.log

For example, vulnerability reports were created for the following namespaces:

kube-system metallb-system ntnx-system

Oct 16 '25 10:10 mavo86

@mavo86 thanks for the details, I'll take a look

you check configauditreports, does that mean vulnerabilityreports are also missing?

Oct 17 '25 06:10 afdesk

The error is that vulnerability reports are not generated in all namespaces. I checked the configauditreports to see whether no reports are generated for these namespaces or only the vulnerability reports are missing.

I also took another look at the exposed secret reports. In the namespaces where the vulnerability reports are missing, there are also no exposed secret reports.

However, I cannot judge whether exposed secret reports should be there.

Oct 17 '25 07:10 mavo86

@mavo86 thanks a lot for your assist, I'll take a look and will update you here.

Oct 17 '25 11:10 afdesk

@mavo86 I took a look at the logs again.

at first, there are some known issues here: scan jobs for some images (cilium-agent, mount-bpf-fs, prerequisite-kustomization-wait etc) were brokeb by OOMKilled, it seems you need to increase memory limits to resolve it.

Next error means trivy-operator can't schedule a node collector job. This usually happens when you don’t have sufficient permissions to access managed Kubernetes clusters.

2025-10-16T09:58:43Z	ERROR	Reconciler error	{"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"srvhvbgclla0299-md-0-pffgz-s9sqd-wwntt"}, "namespace": "", "name": "srvhvbgclla0299-md-0-pffgz-s9sqd-wwntt", "reconcileID": "3a5f6d0f-c58d-47ba-be4a-aa43a820f34d", "error": "creating job: no compliance commands found"}

However, there are some errors in the job, the cause of which is indeed difficult to determine from the logs. For example: trivy-system/scan-vulnerabilityreport-5b69946b5f.

2025-10-16T09:52:42Z	ERROR	reconciler.scan job	Scan job container	{"job": "trivy-system/scan-vulnerabilityreport-5b69946b5f", "container": "node-exporter", "status.reason": "Error", "status.message": ""}

I’d suggest setting a ScanJobTTL and checking the messages there.

At this point, it seems to me that the Trivy Operator might not have access to the container images it’s trying to scan — a situation that’s quite typical for managed Kubernetes clusters.

Could you please verify the permissions?

Oct 20 '25 09:10 afdesk

@afdesk

Thank you for your feedback. I have increased the TTL and added a description and the log. These can be found in the attachment.

Error_scan.log

I also increased the RAM limit to 1 GB, but if that has nothing to do with the main problem, it's irrelevant for now.

Attached are the RBACs, as far as I remember, these are the defaults from the static deployment of GitHub.

02_trivy-operator_rbac_v2.yaml

I also tried to make login credentials available for our Harbor instance via the parameter OPERATOR_PRIVATE_REGISTRY_SCAN_SECRETS_NAMES:. But I can't say whether they are even trying to use them.

As far as I understand, the format is ‘{“namespace”:“secretsname”}’

OPERATOR_PRIVATE_REGISTRY_SCAN_SECRETS_NAMES: '{“trivy-system”:“nkp-repo-proxy”}'

And here are the logs from the last test run

Trivylog__201025.log

Oct 20 '25 12:10 mavo86

Error_scan.log

the scan job couldn't run next command:

$ trivy image hvbg-container-repository.itshessen.hessen.de/github/google/cadvisor:v0.53.0

could you check this image is accessable for Trivy-operator?

my suggestion: the same issue is on EKS for some images: https://github.com/aquasecurity/trivy-operator/issues/2369#issuecomment-2626633611

Oct 20 '25 12:10 afdesk

Yes, I'll check that out, but you'll have to tell me how to do it.

Oct 20 '25 13:10 mavo86

im not familiar with NutanixKubernetsPlatform, but i'd try to run this image (may be via a new job), or use something like ssh / crictl to pull it.

Oct 21 '25 04:10 afdesk

Info:

We have included Nutanix Kubernetes Platform for informational purposes only. Other users may have a similar environment. This is a preconfigured environment with node images and a default configuration for Kubernetes from the manufacturer Nutanix.

I think we can check everything, but we may need to ask how. I think for your “instructions,” it's sufficient to view it from the perspective of a normal Kubernetes cluster.

Perhaps I misunderstood you. I thought I should run a command from the trivy-operator pod to check whether the image is accessible.

If that's the case, what exactly would help you? Roll out the daemon set in the trivy-system namespace and use the trivy-operator service account.

I'm not very familiar with this environment yet and may need to ask for clarification myself.

Oct 21 '25 07:10 mavo86

@mavo86 I need to think over how I can help you more now.

As said @simar7 here

Trivy operator makes no guarantees on managed Kubernetes flavors, so it's quite possible we don't fully support it.

Unfortunately, I'm not able to test it on Nutanix Kubernetes Platform right now

Oct 22 '25 03:10 afdesk

@afdesk: That's okay. It would be great if you could help us with the problem, but I understand that it's not possible to support all types of managed Kubernetes clusters.

Maybe you can help me understand how you saw that it's related to/caused by the following command.

$ trivy image hvbg-container-repository.itshessen.hessen.de/github/google/cadvisor:v0.53.0

As far as I can see from the description of the scan container, this is an argument for starting. Or did you see that in the Trivy-operator logs?

But in the container events it says . .

Normal Created 2m31s kubelet Created container: cadvisor Normal Started 2m31s kubelet Started container cadvisor"

To me, this sounds like the scanner was able to load the image and then an error occurred later.

Is there a parameter for the Trivy-operator to not only enable debugging but also set the log level even deeper?

Oct 22 '25 05:10 mavo86

@mavo86 sorry, missed your message.

how you saw that it's related to/caused by the following command.

the logs contain next, so i decided about permession error.

 trivy image hvbg-container-repository.itshessen.hessen.de/github/google/cadvisor:v0.53.0 --cache-dir /tmp/trivy/.cache --format json --image-config-scanners secret --scanners vuln,secret --skip-db-update --slow --list-all-pkgs --output /tmp/scan/result_cadvisor.json 2>/tmp/scan/result_cadvisor.json.log && bzip2 -c /tmp/scan/result_cadvisor.json | base64
    State:          Terminated
      Reason:       Error
      Exit Code:    1

Is there a parameter for the Trivy-operator to not only enable debugging but also set the log level even deeper?

unfortunately, no, i think we need to add it ASAP - #2725.

Oct 28 '25 09:10 afdesk

@afdesk:

Today, I noticed that in the Harbor instance we use as a repo, there is a difference in the GUI between the images for which we receive reports and the others.

In the /nkp path, we have to push the images so that the Nutanix Kubernetes Platform can provide its own containers. There, I see “artifacts” 1.

The other paths are configured as “proxy cache” and show artifacts 0.

Are you aware of anything we need to consider in Harbor as a “proxy cache” in connection with Trivy Operator?

Nov 04 '25 08:11 mavo86

Are you aware of anything we need to consider in Harbor as a “proxy cache” in connection with Trivy Operator?

Interesting question, I am using an internal unauthenticated harbour on baremetal (non-managed) Kuberentes and I am experiencing the same behaviour described here. At first I thought it was a Storage error but when I solved that error I was left with scan pods just exiting like this with no ram or CPU limits set, no OOMs and the job just exiting after a pull. I'm going to set the TTL and see if it's just exiting right at the image scan comman like this.

Nov 09 '25 17:11 Routhinator

Is there any new information on this case?

Is there anything to consider in relation to Harbor as a “proxy cache” and Trivy Operator?

Nov 17 '25 13:11 mavo86

We're currently experiencing the same problem in our RedHat Openshift cluster (Openshift version 4.20.3) that has its nodes as AWS EC2 instances. The Kubernetes server version is v1.33.5. We deployed the trivy-operator in version 0.29.0 via Helm (chart version 0.31.0), operating it in Standalone mode. The scan job pods exit almost immediately after the init container runs. The init container logs show that the action of pulling the vulnerability db was successful. Afterwards, the main containers exit via exit code 1, without any logs.

We see a load of the following error message in the trivy-operator's pod logs:

{"level":"error","ts":"2025-12-16T13:44:36Z","logger":"reconciler.scan job","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-66dd687756","container":"workbench-controller","status.reason":"Error","status.message":"","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport/controller.(*ScanJobController).completedContainers\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller/scanjob.go:441\ngithub.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport/controller.(*ScanJobController).SetupWithManager.(*ScanJobController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller/scanjob.go:103\nsigs.k8s.io/controller-runtime/pkg/reconcile.TypedFunc[...].Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:134\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:461\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:421\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func1.1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:296"}

Dec 16 '25 13:12 mhanci92