trivy-operator
trivy-operator copied to clipboard
Reconciler error: Large size VulnerabilityReports will failed create and remains scan jobs
What steps did you take and what happened:
Creation of VulnerabilityReports fails if the number of detected vulnerabilities is too large.
Steps to reproduce
- Create a Pod with
python:3.5.10-buster
image on a cluster with trivy-operator installed
(python:3.5.10-buster
has 1252 vulnerabilities.)
$ kubectl create deployment python -n default --image python:3.5.10-buster
- Scan finishes successfully, but creation of VulnerabilityReports fails
# Scan is completed successfully, but no VulnerabilityAlerts are created and no jobs are deleted
❯ kubectl get jobs -n trivy-system
NAME COMPLETIONS DURATION AGE
scan-vulnerabilityreport-7d8db4fc84 1/1 73s 5m42s
❯ kubectl get pods -n trivy-system
NAME READY STATUS RESTARTS AGE
scan-vulnerabilityreport-7d8db4fc84-gsztw 0/1 Completed 0 4m12s
trivy-operator-54bc4769db-5djnd 1/1 Running 0 85m
❯ kubectl -n trivy-system logs scan-vulnerabilityreport-7d8db4fc84-gsztw | tr -d "\n" | base64 -d | bunzip2 | jq .ArtifactName
Defaulted container "python" out of: python, 95164937-7089-491c-ab58-7f52bc8a8cce (init)
"python:3.5.10-buster"
❯ kubectl get vulnerabilityreports -n default
No resources found in default namespace.
# Reconciler error is logged in Operator
❯ kubectl logs -n trivy-system trivy-operator-54bc4769db-5djnd
...
{"level":"error","ts":1670231487.1708708,"msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-7d8db4fc84","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-7d8db4fc84","reconcileID":"7e1ccc8e-f6eb-4080-b1dd-25dcdb150e7c","error":"rpc error: code = ResourceExhausted desc = trying to send message larger than max (2633583 vs. 2097152)","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
I am not familiar with the Kubernetes Operator implementation, but from the error message (trying to send message larger than max (2633583 vs. 2097152)
) I would guess that it is due to the VulnerabilityAlerts being too large.
Normally, reconcileJobs()
should delete the Job, but it gets stuck because it fails to create VulnerabilityReports.
Therefore, when the number of similar jobs reaches the scanJobsConcurrentLimit
, no new scans will be performed.
What did you expect to happen:
There are several possible approaches to solving this error:
- Failure to create VulnerabilityReports Job to be deleted
- VulnerabilityReports are successfully created
Environment:
- Trivy-Operator version (use
trivy-operator version
): 0.7.1 - Kubernetes version (use
kubectl version
): v1.25.0 - OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc):