trivy-operator
trivy-operator copied to clipboard
OOMKilled in vulnerability scan job
What steps did you take and what happened:
Since we migrate from starboard-operator to trivy-operator, we see now many jobs terminate with OOMKilled in the trivy-operator log:
{"level":"error","ts":1658230025.0035832,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"security/scan-vulnerabilityreport-5d76c6d6d8","container":"teleport","status.reason":"OOMKilled","status.message":"","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:363\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
We also increased the cpu and memory limit to:
ressources:
limits:
memory: 2Gi
cpu: 2
But did not helped.
We run trivy in ClientServer
What did you expect to happen:
That there is no OOM error in the job.
Environment:
- Trivy-Operator version (use
trivy-operator version
): 0.1.3 - Kubernetes version (use
kubectl version
): 1.20
@dirien can you please share more info :
-
request
value stay as default for thevulnerability scan job
? - how many
vulnerability scan jobs
running on the same Node , where problem occur? - what is the memory available on the Node?
Hi @chen-keinan,
ofc!
-
request
is default - just this one. the other all run through.
- 12GB
request
is default- just this one. the other all run through.
- 12GB
Thanks can you please run : kubectl describe pod [name]
and send here the output
request
is default- just this one. the other all run through.
- 12GB
Thanks can you please run :
kubectl describe pod [name]
and send here the output
from the trivy-operator?
from the trivy-operator?
on the pod (vulnerability scan job) that has the OOM issue.
There is a job that run the vulnerability scanning and went OOM , by default its running on trivy-system
namespace , I need you to the the command above on that pod
Unfortunatly its to quick deleted :(
❯ k describe pod scan-vulnerabilityreport-55787d6c98-bq4zn -n security Error from server (NotFound): pods "scan-vulnerabilityreport-55787d6c98-bq4zn" not found
I hope you'll get better luck catching it next time , it will help to track the root cause of OOM , in the mean time will try to look for it myself
I think this has nothing to do with luck, we need this feature IMO: https://github.com/aquasecurity/trivy-operator/issues/228 😉
@erikgb go a head and pick it up , I assume its more important than other PRs . and let's make it configurable (if to use it or not)
Would be awesome!
@erikgb go a head and pick it up , I assume its more important than other PRs . and let's make it configurable (if to use it or not)
I would love to, but waiting for other PR making it easier to test things before suggesting new features. 😊
I also encountered the same problem, the memory usage of the scan job did exceed the limit setting in the K8s resource quotas
Hi there!
Got the same problem with OOMKilled, but in my case, scan-vulnerabilityreport-*
pod gets killed almost every time and each time it was during DB downloading process:
kubectl -n trivy-system describe pod/scan-vulnerabilityreport-6ff5d9956f-7f9bt
...skipped...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 58s default-scheduler Successfully assigned trivy-system/scan-vulnerabilityreport-6ff5d9956f-7f9bt to minikube
Normal Pulled 57s kubelet Container image "ghcr.io/aquasecurity/trivy:0.30.0" already present on machine
Normal Created 57s kubelet Created container 38b439b5-8f5d-409c-8211-c8fe84733bf9
Normal Started 57s kubelet Started container 38b439b5-8f5d-409c-8211-c8fe84733bf9
Normal Pulled 29s kubelet Container image "ghcr.io/aquasecurity/trivy:0.30.0" already present on machine
Normal Created 28s kubelet Created container hello-world
Normal Started 28s kubelet Started container hello-world
kubectl -n trivy-system logs -l app.kubernetes.io/name=trivy-operator
...skipped...
{"level":"error","ts":1662634462.1451323,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-6ff5d9956f","container":"hello-world","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
Loki query {pod="scan-vulnerabilityreport-6ff5d9956f-7f9bt"} |= ``
2022-09-08 13:53:29 2022-09-08T10:53:29.745Z INFO Need to update DB
2022-09-08 13:53:29 2022-09-08T10:53:29.745Z INFO DB Repository: ghcr.io/aquasecurity/trivy-db
2022-09-08 13:53:29 2022-09-08T10:53:29.745Z INFO Downloading DB...
2022-09-08 13:53:40 707.62 KiB / 33.81 MiB [->___________________________________________________________] 2.04% ? p/s ?1.64 MiB / 33.81 MiB [--->___________________________________________________________] 4.85% ? p/s ?2.81 MiB / 33.81 MiB [----->_________________________________________________________] 8.32% ? p/s ?3.98 MiB / 33.81 MiB [----->____________________________________________] 11.78% 5.48 MiB p/s ETA 5s5.15 MiB / 33.81 MiB [------->__________________________________________] 15.24% 5.48 MiB p/s ETA 5s6.25 MiB / 33.81 MiB [--------->________________________________________] 18.50% 5.48 MiB p/s ETA 5s7.27 MiB / 33.81 MiB [---------->_______________________________________] 21.50% 5.48 MiB p/s ETA 4s8.10 MiB / 33.81 MiB [----------->______________________________________] 23.95% 5.48 MiB p/s ETA 4s9.23 MiB / 33.81 MiB [------------->____________________________________] 27.30% 5.48 MiB p/s ETA 4s10.49 MiB / 33.81 MiB [--------------->_________________________________] 31.01% 5.47 MiB p/s ETA 4s11.52 MiB / 33.81 MiB [---------------->________________________________] 34.08% 5.47 MiB p/s ETA 4s12.50 MiB / 33.81 MiB [------------------>______________________________] 36.96% 5.47 MiB p/s ETA 3s13.67 MiB / 33.81 MiB [------------------->_____________________________] 40.44% 5.46 MiB p/s ETA 3s15.09 MiB / 33.81 MiB [--------------------->___________________________] 44.63% 5.46 MiB p/s ETA 3s16.42 MiB / 33.81 MiB [----------------------->_________________________] 48.57% 5.46 MiB p/s ETA 3s17.69 MiB / 33.81 MiB [------------------------->_______________________] 52.32% 5.54 MiB p/s ETA 2s19.15 MiB / 33.81 MiB [--------------------------->_____________________] 56.64% 5.54 MiB p/s ETA 2s20.65 MiB / 33.81 MiB [----------------------------->___________________] 61.08% 5.54 MiB p/s ETA 2s22.12 MiB / 33.81 MiB [-------------------------------->________________] 65.43% 5.66 MiB p/s ETA 2s23.49 MiB / 33.81 MiB [---------------------------------->______________] 69.46% 5.66 MiB p/s ETA 1s24.56 MiB / 33.81 MiB [----------------------------------->_____________] 72.62% 5.66 MiB p/s ETA 1s25.87 MiB / 33.81 MiB [------------------------------------->___________] 76.51% 5.70 MiB p/s ETA 1s27.04 MiB / 33.81 MiB [--------------------------------------->_________] 79.96% 5.70 MiB p/s ETA 1s28.08 MiB / 33.81 MiB [---------------------------------------->________] 83.06% 5.70 MiB p/s ETA 1s29.37 MiB / 33.81 MiB [------------------------------------------>______] 86.86% 5.71 MiB p/s ETA 0s30.61 MiB / 33.81 MiB [-------------------------------------------->____] 90.53% 5.71 MiB p/s ETA 0s32.00 MiB / 33.81 MiB [---------------------------------------------->__] 94.65% 5.71 MiB p/s ETA 0s33.15 MiB / 33.81 MiB [------------------------------------------------>] 98.04% 5.74 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 5.74 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 5.74 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 5.44 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 5.44 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 5.44 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 5.09 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 5.09 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 5.09 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 4.76 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 4.76 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 4.76 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 4.46 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 4.46 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 4.46 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 4.17 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 4.17 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [----------------------------------------------->] 100.00% 4.17 MiB p/s ETA 0s33.81 MiB / 33.81 MiB [--------------------------------------------------] 100.00% 3.78 MiB p/s 9.1s
2022-09-08 13:54:19 Killed
Pod eats 477M before gets killed. After increasing memory limit up to 1500M - OOMKilling was gone.
kubectl -n trivy-system edit cm trivy-operator-trivy-config
apiVersion: v1
data:
...skipped...
trivy.resources.limits.cpu: 1500m
trivy.resources.limits.memory: 1500M
...skipped...
Also I noticed that each scan-vulnerabilityreport-*
pod downloads complete DB, so 10 parallel scans (default) does too much useless work by default
P.S. I think that increasing limit is not a solution in that case
@SergeyBear this is the 1st time reported on OOM issue during download db. its strange as the db is relatively small.
can you please share :
- how many scan job are running on the specific node that have OOM issue ?
- how much memory is define for that node ?
As a workaround I would suggest to move to client/server mode where trivy
db is downloaded only once on the server side
@chen-keinan At the begining I used fresh minikube cluster with 3 cpus and 6 gigs of RAM, then deployed trivy-operator and dummy nodejs app on it and started to get OOMKilled, even when almost all 3 cores and 6 gigs was free.
Then I installed prometheus and loki stack to catch OOM, but there is still plenty of free resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1450m (48%) 900m (30%)
memory 816Mi (13%) 782Mi (13%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
It would be nice if someone with the same OOMKilled issue could check if OOM appears on DB downloading and tried to increase limit...
@chen-keinan sorry, forgot to mention - it appears even on single scan job. After increasing trivy limits - OOM is gone and reports completed with no errors. P.S. It is strange that downloading could bring OOM, but checked more than ten times - everytime on donwloading...
@chen-keinan sorry, forgot to mention - it appears even on single scan job. After increasing trivy limits - OOM is gone and reports completed with no errors. P.S. It is strange that downloading could bring OOM, but checked more than ten times - everytime on donwloading...
Thanks for the putting this info , we are investigating the scan job OOM issue (during scanning process) , I'll update shortly when we will completed the investigation
@chen-keinan deployed trivy-server and set trivy-operator in ClientServer mode:
- with 500M memory limit
scan-vulnerabilityreport-*
pod gets killed with empty logs - with 1500M memory limit
scan-vulnerabilityreport-*
pod completes fine; no DB update procedure performed - enabling
OPERATOR_LOG_DEV_MODE
shows noOOMKilled
in logs andscan-vulnerabilityreport-*
is NOT getting killed even with 500M memory limit... which is wierd, but switching true/false five times and result is the same - true (completed) / false (OOMKilled)
Which version of Trivy are you guys using? Did anybody try v0.31.3? We added some improvements in v0.30.1.
@knqyf263 I'm using trivy-operator 0.1.9 that uses trivy 0.30.0 image
@SergeyBear I have upgraded trivy version to 0.31.3 to be release with next trivy-operator
version
Hello there,
I think I experienced the same issue, pretty much all of the scan-vulerabilityreport failed but after downloading DB. I can't get any container logs.
:information_source: I have installed trivy-operator helm chart v0.1.9 in Standalone mode, with default resources definition.
See the trivy-operator logs here :
{"level":"error","ts":1663233467.890931,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-f48c5d464","container":"prometheus","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233467.8910558,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-f48c5d464","container":"thanos-sidecar","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233470.1496108,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-75d8fc986c","container":"loki","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233470.7312384,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-ffc4644dd","container":"alertmanager","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233471.7152555,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-8ddcf4c66","container":"kube-prometheus-stack","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233472.6912272,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-564bf7dc9c","container":"project-xxx","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233477.8117092,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-74f98796c5","container":"bdd-postgis","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233480.5189712,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-6d99dbdb57","container":"cert-manager","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233488.7973578,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-766578d848","container":"promtail","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233525.286571,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-5d7cb8455b","container":"kyverno","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233525.2866511,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-5d7cb8455b","container":"kyverno-pre","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1663233540.7202878,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"trivy-system/scan-vulnerabilityreport-5489cbf97c","container":"cluster-register","status.reason":"OOMKilled","status.message":"Killed\n","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:381\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
@LucasVanHaaren you can try overriding trivy version by :
kubectl patch cm trivy-operator-trivy-config -n trivy-system \
--type merge \
-p "$(cat <<EOF
{
"data": {
"trivy.imageRef": "ghcr.io/aquasecurity/trivy:0.31.3",
}
}
EOF
)"
or wait for trivy-operator v0.2.0
which include this version by default
Thanks for your response !
I just tried to upgrade to trivy:0.31.3
image and scan-vulnerabilityreports still be OOMKilled ...
I also tried to increase memory limits to 1GiB and it's the same, nothing change.
Can it be possibly a too small nodes issue ? I ran a managed cluster with 2 worker nodes with 4CPU and 14GiB memory each. it host not many apps and seems not overwhelmed.
Thanks for your response !
I just tried to upgrade to
trivy:0.31.3
image and scan-vulnerabilityreports still be OOMKilled ... I also tried to increase memory limits to 1GiB and it's the same, nothing change.Can it be possibly a too small nodes issue ? I ran a managed cluster with 2 worker nodes with 4CPU and 14GiB memory each. it host not many apps and seems not overwhelmed.
it could be , it is depend on the amount of workload running on you node.
you need to check the limit.memory
sum of all of your workloads ,it must not exceed Node memory.
Note: that trivy-operator
by default can produce up to 10 (configurable) scanJobs on parallel , so its needs to be taken under consideration as well
1 gig limit is too low. I managed to get rid of OOMKilled only with 1.5 gig memory limit, if you have enough free memory in cluster of course
also try to reduce number of parallel scan jobs in operator
@chen-keinan After checking it, I confirm that the sum of the memory limits exceeds the amount of memory of the node.
@SergeyBear Thanks for the tips, I will try it soon with 1.5GiB mem limit and with only 2 parallel scan jobs, because cluster reliability is much important than the speed of image scanning !
I will also use a trivy server instance to avoid downloading db every time, maybe this will help too.
Hey everybody, I applied all your tips (using a dedicated trivy server instance, setting 1.5G memory limit and reduce parallel scan jobs to 2) and now it works !
PS: I see sometimes more than 2 scan jobs in parallel but I no longer had a OOMKilled so it's great.
Thanks a lot :smiley:
Just checked latest trivy-operator 0.3.0 on fresh minikube cluster (3 cpu 6gb ram) with only installed sealed-secrets and trivy-server - still gets OOMKilled with default memory 500M limit and OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT set to 1. Increasing memory limit to 1500M and OOMKilled goes away. Probably there is some short spike in memory consumption