trivy-operator icon indicating copy to clipboard operation
trivy-operator copied to clipboard

fix: sync stdout buffer to file

Open daanschipper opened this issue 1 year ago • 4 comments

Description

The trivy command is completed and as it is the main process the entire container is stopped before the stdout buffer is cleared, resulting in malformed output.

This patch has been running for over a month and I have not experienced any lingering scan vulnerability jobs which could not be removed by the operator anymore. Used kube_job_complete{namespace="trivy-system", condition="true", job_name=~"scan-vulnerabilityreport-[0-9a-f]+"} to alert for scan jobs with malformed output.

I did refactor the getCommandAndArgs function to deduplicate the logic for compressed and not compressed mode, it was essentially the same. Also ensured the options are in alphabetic order.

Related issues

  • Close #1792.

Checklist

  • [x] I've read the guidelines for contributing to this repository.
  • [x] I've added tests that prove my fix is effective or that my feature works.
  • [x] I've updated the documentation with the relevant information (if needed). Not applicable.
  • [x] I've added usage information (if the PR introduces new options). Not applicable.
  • [x] I've included a "before" and "after" example to the description (if the PR is a user interface change). Not applicable.

daanschipper avatar Jul 19 '24 09:07 daanschipper

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 19 '24 09:07 CLAassistant

Also updated the *-expected-scan.yaml files for the envtest.

daanschipper avatar Jul 24 '24 20:07 daanschipper

@chen-keinan could you trigger the workflow again? :pray:

daanschipper avatar Aug 05 '24 13:08 daanschipper

I'm running the failing integration test tests/e2e/config/fs-sbom.yaml locally on the latest origin/HEAD (18e40db60a957cf233afeb442bddca62a03e45e9), but the test is failing there as well. I'm observing the following

  • The scan-vulnerabilityreport Job gets created
  • The scan-vulnerabilityreport-xxxxxx Pod get created and completes successfully. The logs get outputted as expected
  • The trivy operator outputs the following logs and does not create a VulnerabiltyReport
2024-08-26T11:49:25Z    DEBUG    reconciler.scan job    Pod must have been deleted    {"job-results-processor": "trivy-system/scan-vulnerabilityreport-9845bdc5f"}                                                                                                                                                      2024-08-26T11:49:25Z    DEBUG    reconciler.scan job    Deleting complete scan job    {"job": "trivy-system/scan-vulnerabilityreport-9845bdc5f", "owner": {"apiVersion": "v1", "kind": "Pod", "namespace": "e2e-test", "name": "my-pod"}}                                                                               2024-08-26T11:49:25Z    DEBUG    reconciler.scan job    Ignoring cached job that must have been deleted    {"job": "trivy-system/scan-vulnerabilityreport-9845bdc5f"} 

I'm not sure how to continue from here. The failing integration test seems to test pkg/plugins/trivy/filesystem.go which I did not edit.

daanschipper avatar Aug 26 '24 12:08 daanschipper

This PR is stale because it has been labeled with inactivity.

github-actions[bot] avatar Dec 11 '24 00:12 github-actions[bot]

Are there any plans to merge these changes? We are also experiencing reconciliation errors with the trivy operator every now and then, because scan job logs are truncated and therefore not decompressible.

best regards

hsedr avatar Jan 31 '25 17:01 hsedr

@hsedr You can use ghcr.io/daanschipper/trivy-operator:0.22.0-sync0.0.3, that has the above changes and some dependency upgrades.

daanschipper avatar Jan 31 '25 17:01 daanschipper

@daanschipper Thanks for your contribution! I'll take it into consideration. But I think it would be best if this fix ends up in this repository.

hsedr avatar Jan 31 '25 17:01 hsedr

Agreed, resolved the merge conflict. This PR received no attention for months, therefore I opted to publish my changes and use that.

daanschipper avatar Jan 31 '25 18:01 daanschipper

@daanschipper thanks for the contribution! I'll take a look at this PR

afdesk avatar Feb 03 '25 05:02 afdesk

@daanschipper thanks for your work! Looks great! I left a really small comment to refactor this block.

Although there is a mixing refactoring and a fix (improvement with sync) in the same PR, there isn’t much logic involved here, so it’s fine. @simar7 wdyt?

afdesk avatar Feb 03 '25 10:02 afdesk

hi @daanschipper thanks for the PR. It lgtm, but just curious if we can assume the sync utility to be widely available on most machines? Otherwise we would risk breakage.

simar7 avatar Feb 03 '25 23:02 simar7

sync is a standard system call in the unix operating system, see https://linux.die.net/man/8/sync.

daanschipper avatar Feb 04 '25 05:02 daanschipper

@simar7 can this be merged?

daanschipper avatar Feb 19 '25 12:02 daanschipper