fix: sync stdout buffer to file
Description
The trivy command is completed and as it is the main process the entire container is stopped before the stdout buffer is cleared, resulting in malformed output.
This patch has been running for over a month and I have not experienced any lingering scan vulnerability jobs which could not be removed by the operator anymore. Used kube_job_complete{namespace="trivy-system", condition="true", job_name=~"scan-vulnerabilityreport-[0-9a-f]+"} to alert for scan jobs with malformed output.
I did refactor the getCommandAndArgs function to deduplicate the logic for compressed and not compressed mode, it was essentially the same. Also ensured the options are in alphabetic order.
Related issues
- Close #1792.
Checklist
- [x] I've read the guidelines for contributing to this repository.
- [x] I've added tests that prove my fix is effective or that my feature works.
- [x] I've updated the documentation with the relevant information (if needed). Not applicable.
- [x] I've added usage information (if the PR introduces new options). Not applicable.
- [x] I've included a "before" and "after" example to the description (if the PR is a user interface change). Not applicable.
Also updated the *-expected-scan.yaml files for the envtest.
@chen-keinan could you trigger the workflow again? :pray:
I'm running the failing integration test tests/e2e/config/fs-sbom.yaml locally on the latest origin/HEAD (18e40db60a957cf233afeb442bddca62a03e45e9), but the test is failing there as well. I'm observing the following
- The
scan-vulnerabilityreportJobgets created - The
scan-vulnerabilityreport-xxxxxxPodget created and completes successfully. The logs get outputted as expected - The trivy operator outputs the following logs and does not create a
VulnerabiltyReport
2024-08-26T11:49:25Z DEBUG reconciler.scan job Pod must have been deleted {"job-results-processor": "trivy-system/scan-vulnerabilityreport-9845bdc5f"} 2024-08-26T11:49:25Z DEBUG reconciler.scan job Deleting complete scan job {"job": "trivy-system/scan-vulnerabilityreport-9845bdc5f", "owner": {"apiVersion": "v1", "kind": "Pod", "namespace": "e2e-test", "name": "my-pod"}} 2024-08-26T11:49:25Z DEBUG reconciler.scan job Ignoring cached job that must have been deleted {"job": "trivy-system/scan-vulnerabilityreport-9845bdc5f"}
I'm not sure how to continue from here. The failing integration test seems to test pkg/plugins/trivy/filesystem.go which I did not edit.
This PR is stale because it has been labeled with inactivity.
Are there any plans to merge these changes? We are also experiencing reconciliation errors with the trivy operator every now and then, because scan job logs are truncated and therefore not decompressible.
best regards
@hsedr You can use ghcr.io/daanschipper/trivy-operator:0.22.0-sync0.0.3, that has the above changes and some dependency upgrades.
@daanschipper Thanks for your contribution! I'll take it into consideration. But I think it would be best if this fix ends up in this repository.
Agreed, resolved the merge conflict. This PR received no attention for months, therefore I opted to publish my changes and use that.
@daanschipper thanks for the contribution! I'll take a look at this PR
@daanschipper thanks for your work! Looks great! I left a really small comment to refactor this block.
Although there is a mixing refactoring and a fix (improvement with sync) in the same PR, there isn’t much logic involved here, so it’s fine.
@simar7 wdyt?
hi @daanschipper thanks for the PR. It lgtm, but just curious if we can assume the sync utility to be widely available on most machines? Otherwise we would risk breakage.
sync is a standard system call in the unix operating system, see https://linux.die.net/man/8/sync.
@simar7 can this be merged?