syft icon indicating copy to clipboard operation
syft copied to clipboard

Add support for user provided content "hints" file

Open wagoodman opened this issue 4 years ago • 10 comments

syft should be aware of user-specified content files, which can override or add additional known packages to a catalog.

This should be in feature parity with https://github.com/anchore/enterprise/issues/185

wagoodman avatar Jun 01 '20 17:06 wagoodman

  • schema: https://github.com/anchore/bom-import-tool/blob/master/anchore_bom_importer/data/bom-input-schema.json
  • example: https://github.com/anchore/cisco-bom-importer/blob/master/example.json

Consider supporting yaml, toml, and json (with struct tags)

wagoodman avatar Jul 24 '20 14:07 wagoodman

Consider using CycloneDX for the input format as well https://github.com/anchore/syft/issues/67

wagoodman avatar Aug 10 '20 15:08 wagoodman

We should consider allowing this functionality to be downstream (outside) of syft. Syft is cataloging what was actually found, and if there is a modification to the output needed a consumer can perform this action. It isn't immediately clear that this is syft's responsibility.

Since this is a security tool that can be used in verifying supply-chain concerns it is reasonable to assert that the SBOM output generated from Syft should be verified by syft --allowing for a "catch all" hints file to add, modify, or remove packages outside of the observations of syft would start to break this assumption.

wagoodman avatar Aug 17 '21 14:08 wagoodman

Somewhat contradictory to the above comment, I think there is room for adding "exceptional" content in syft output via configuration. I think it matters how we do this. Such as labeling individual packages/elements with "manually-added" or similar to track in the SBOM what was "magically" added. We want to make SBOMs that we generate as reproducible as possible, which means being transparent about what the inputs were to generate the SBOM (including content hints).

It could be that all contents hints get injected into a separate SBOM that gets referenced in the main SBOM that contains what was discovered.

As a side note "content hints" sounds very optional/conditional/implicit where as the mechanism being described here should imply that what is being used is explicit and intentional. We should consider naming this feature something different than "content hints".

wagoodman avatar Sep 14 '21 15:09 wagoodman

We could simplify the functionality some to make the solution space more tractable; what if we only allowed for the addition of packages and maybe the removal of packages, but not the modification of packages.

Even not allowing package mutations makes this much simpler (you don't need to try to pair-wise match every hint-package with every discovered-package, and figure which fields should be considered).

wagoodman avatar Jan 06 '22 22:01 wagoodman

That is inline with the current anchore engine behavior, which can only add new entries to the list, not modify an existing entry.

zhill avatar Jan 06 '22 22:01 zhill

From refinement:

  • We probably shouldn't call this "content hints"
  • Possible implementation path: implement template output which would allow the user to add packages via the template

Note: this is blocking removal of the existing python code for the analysis in anchore-engine.

wagoodman avatar Jan 24 '22 18:01 wagoodman

Is CycloneDX the expected input format? (and if so, is this just a dupe of #737 ?)

I'm considering containers like eclipse-temurin:17-jre-alpine, which fetch a trusted binary that existing catalogers don't understand.

My naive and ideal solution would be dropping a CPE+purl in simple text format like /opt/java/openjdk/breadcrumb-for-syft.txt, it is feasible with echo in the same layer as the wget.

thepwagner avatar Mar 28 '22 23:03 thepwagner

@wagoodman just a ping for when you get back I think this "hints" or new SBOM cataloger is a common feature request we're seeing a lot more of now this year. I want to see if we can come to a basis on what the initial feature looks like so I can make a PR that supports at least some user-related configuration for custom CPE generation while also allowing them to fill in packages that we cannot detect at this moment (binary analysis or db parse for image scan)

spiffcs avatar Jul 12 '22 20:07 spiffcs

We're using https://github.com/shopify/hansel as a hack today. For deb/apk/rpm-based distributions it generates empty packages that serve as simple hints: name+version. If there's a way we can accept+encode custom CPEs in the packages or you have any other feedback, please open an issue!

thepwagner avatar Jul 12 '22 23:07 thepwagner

During a recent discussion we wanted to capture that format specific fields should be considered as a part of this hints file

Ex:

I know package x has supplier: foobar
I expect the SPDX output of the sbom given this hints file to have foobar as the supplier no matter other syft logic

spiffcs avatar Aug 01 '23 20:08 spiffcs