trivy-operator
trivy-operator copied to clipboard
Running operator on containerd cuts the logs in client/server mode
What steps did you take and what happened:
I took the last version of trivy-operator and started it with client server mode, which worked out for the most pods, but on random it starts to fail because of cut logs, thus it can not parse them when retrieving them. This doesn't happen when running on dockerd as a CRI. The same can happens when you try to run multiple pods calling the same client command to server. What did you expect to happen: I expect logs not to get cut when running on containerd as CRI Anything else you would like to add: What else can be done is revert trivy image version to 0.29.2 (in values.yaml of helm chart), as in the new versions 0.30.* memory usage is big, which causes to OOM the pods created by the job as the limits aren't changed when new release was created. [Miscellaneous information that will assist in solving the issue.] What probably causes the problem is because having multiple client opens multiple rpc channels, and some of them are getting closed/gc in the middle of transfer of the information.
- Trivy-Operator version (use
trivy-operator version
): latest - Kubernetes version (use
kubectl version
): 1.21.14 - OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): gardenlinux
@1003n40 thank you for the input can you please add logging or additional info on the failure
@chen-keinan could this also be related to kubernetes log rotation configuration?
@1003n40 you can change the trivy image tag by setting this value in trivy-operator-trivy-config
configMap :
imageRef: ghcr.io/aquasecurity/trivy:0.29.1
This issue is stale because it has been labeled with inactivity.
This issue is stale because it has been labeled with inactivity.
Hi, I'm not sure if this is related, but we are seeing the same behavior when running trivy client/server in a k3d/k3s cluster. The trivy client runs in a kubernetes Job and sometimes the scan results are cut when we fetch the logs from the Job.
We are only able to reproduce this behavior on GitHub when using the default runners (https://github.com/statnett/image-scanner-operator/actions/runs/4419186007/jobs/7747301776#step:13:381) and when running the tests on an old Mac. And the behavior typically only occurs when the scan results are large.
We suspect that this is related to hardware constraints, since we are only able to reproduce this when running k3d/k3s on machines with limited CPU/memory resources.
The corresponding log on the node looks like this:
2023-03-14T18:54:13.784447539Z stdout F {
2023-03-14T18:54:13.784566339Z stdout F "fixedVersion": "5.20.2-3+deb8u11",
2023-03-14T18:54:13.784572939Z stdout F "installedVersion": "5.20.2-3+deb8u6",
2023-03-14T18:54:13.784576439Z stdout F "pkgName": "perl",
2023-03-14T18:54:13.784580039Z stdout F "primaryURL": "https://avd.aquasec.com/nvd/cve-2018-12015",
2023-03-14T18:54:13.784583439Z stdout F "severity": "HIGH",
2023-03-14T18:54:13.784586939Z stdout F "title": "perl: Directory traversal in Archive::Tar",
2023-03-14T18:54:13.784591039Z stdout P "vulnerabilityID": "CVE-2018-120
Where stdout P
indicates a partial log entry.
could be related to container log rotation.
workaround : Increase the kubelet default --container-log-max-size
trivy-operator
support compression for scan-job
log output to avoid this issue
Update: after switching to "larger runners" on GitHub we haven't seen this issue.
I don't think it's related to log rotation, as the log file was only ~2MB and there has no second log file available on the node.
@chen-keinan containerd/containerd#7289 It is indeed containerd problem.
This issue is stale because it has been labeled with inactivity.