syft icon indicating copy to clipboard operation
syft copied to clipboard

syft convert: users should be able to selectively filter an SBOM on conversion

Open spiffcs opened this issue 8 months ago • 3 comments

What would you like to be added: When converting from syft-json to other SBOM formats (spdx, 'cyclonedx') users should be able to drop components that add to the size of the SBOM for smaller documents.

Examplea:

Cyclone-DX

The cyclone-dx format is a flat list of components that are labeled as a specific type:

syft convert img.syft.json -o cyclonedx-json=cdx.output.json --exclude-component-types=device-driver,file

The above command would filter out components with the device-driver and file types from the final cdx document.

Note: this is just a suggested design to convey the change.

The issue should be used to discuss what the ideal UX is for this kind of filter.

SPDX

SPDX 2.3 has a more distributed model where things like packages and files have their own top level fields.

SPDX 3.0 moves this to a more centralized design with a top level software field that contains different classes. There are separate top level fields for security licensing dataset ai build and other categories with similar sub classes.

Challenges

Ideally similar knobs should be added for cyclone-dx and spdx. The design should probably center around syft-json objects that can be filtered and their respective other format counterparts.

Why is this needed: Larger images lead to larger SBOMs and power users would like the ability to maintain a data rich format syft-json while publishing for consumption slimmer documents.

spiffcs avatar May 14 '25 18:05 spiffcs

I would add that it could make sense for certain features being disabled to prevent work from being done. For example, if we aren't including files, we don't need to hash files.

kzantow avatar May 14 '25 21:05 kzantow

Some early thoughts on this, we could keep this rather simple:

convert:
  # can be drop/keep (or a generic selector in the future, TBD on format)
  files: 'drop'
  relationships: 'keep'
  packages: 'keep'

Or something more flexible:

convert:
  filter:

    files:
      - type: directory
      - type: symlink
      - mimetype: application/vnd.microsoft.portable-executable
        action: keep    # default is to drop

    packages:
      - name: log4j
      - type: rpm
        name: myzer
      - name: super*
        action: keep    # default is to drop

    relationships:
      - type: *
      - type: dependency-of
        action: keep

wagoodman avatar May 15 '25 19:05 wagoodman

From @kzantow

convert:
  filter:
   - "-cyclonedx:thing"
   - "-packages:rpm"
   - "+licenses:content"   # what does this mean?

I feel this is short hand for:

convert:
  filter:
   - cyclonedx:
       property: thing
   - packages:
       type: rpm
   - licenses:              # what does this mean?
       include: content
     action: keep

wagoodman avatar May 15 '25 19:05 wagoodman