syft icon indicating copy to clipboard operation
syft copied to clipboard

Docker base images should be included in the BOM

Open captn3m0 opened this issue 2 years ago • 15 comments

What would you like to be added: A simple docker image with the following Dockerfile:

FROM php:7.4-cli

COPY scan.php /

should result in a SBOM that includes the base image as a component:

pkg:docker/library/[email protected]

Why is this needed: A container image base image is also a "dependency". For popular base-images, this carries a lot of information, and this can be used to recursively look up other dependencies (that might have been included in the build process, but might not be part of the final image).

I'm not sure how feasible this is, considering docker doesn't seem to store the base image names, but this would be a great addition.

captn3m0 avatar Sep 09 '22 16:09 captn3m0

Hi @captn3m0 -- are you really looking to get php:7.4 properly cataloged with this request, and it's a duplicate of https://github.com/anchore/syft/issues/1197? Or is this actually a request to get the base image container added as a component?

kzantow avatar Oct 06 '22 21:10 kzantow

PHP here is just an example - this is a request for the latter (base images are ingredients, and should be included in a SBOM).

captn3m0 avatar Oct 07 '22 03:10 captn3m0

Investigated this a bit. Docker does not return the base image ID, just the relevant statements from the docker base image. For eg, the amazoncorretto:8u342-alpine3.16-jre image includes the following information about the upstream:

ADD file:2a949686d9886ac7c10582a6c29116fd29d3077d02755e87e111870d63607725 in /

The corresponding dockerfile has:

FROM alpine:3.16

And the hash can actually be found in the alpine:3.6 image: https://github.com/docker-library/repo-info/blob/master/repos/alpine/remote/3.16.md#alpine316---linux-amd64

I'm thinking about generating such common hashes, and publishing them on Rekor so this would get picked up via https://github.com/anchore/syft/issues/1159.

The intended mapping here would be

2a949686d9886ac7c10582a6c29116fd29d3077d02755e87e111870d63607725 ->
  pkg:docker/library/[email protected]

Which files should be looked up could be left to syft, or perhaps I can publish a bloom-filter that helps with quick evaluation for that locally. (Is this a "relevant" base image file).

captn3m0 avatar Oct 07 '22 06:10 captn3m0

Hi team, any update on this feature request? it will be great if docker images can be added to SBOM

khan-a1 avatar Feb 03 '23 17:02 khan-a1

Hi @khan-a1 and @captn3m0, sorry for the very long delay replying. We would like to understand a bit better your use case for including a reference to a docker image in the SBOM itself. Are you familiar with the different scoping options you can specify, with --scope?

We also have an open issue discussing ideas to expand the different scoping selections: https://github.com/anchore/syft/issues/15

Happy to re-engage on this issue and figure out how to move forward. Would you be able to join our community meeting at some point? It might be easier to talk things over live. https://github.com/anchore/syft/#join-our-community-meetings

tgerla avatar Sep 14 '23 20:09 tgerla

Will reply soon with a detailed proposal for why I think this is important.

I haven’t checked the scoping options yet.

I see there’s no meeting on the 21st Thursday, but I will try to join the one on 28th to explain this better.

captn3m0 avatar Sep 14 '23 21:09 captn3m0

I've looked at the scoping options, and the various feature requests for that, and that doesn't fit this use-case.

An SBOM should be an actual artifact of all the components that went in building the final image. Docker base images are a relevant artifact imo.

The primary usecase for this comes from current limitations around Syft's binary matching capabilities, which result in not everything in base images being detected. If anything is installed in the base image outside a "package" - this is very common behavior for official base images - Syft cannot detect it easily.

In such cases, the name of the base image itself is a huge helper in the SBOM. At endoflife.date, we provide EOL information for various products alongside their PURLs. These include PURLs for docker images. See these search results. For example, for composer, we provide the following PURLs:

-   purl: pkg:composer/composer/composer
-   repology: php:composer # this expands to various packages listed at https://repology.org/project/php:composer/versions
-   purl: pkg:docker/library/composer
-   purl: pkg:github/composer/composer

Of these, the pkg:docker one is the relevant one. Say I have a PHP application that uses the official composer base image:

FROM composer:2.6.2
ADD . /src

If you were to build such a dockerfile, Syft would not include the version of composer in the SBOM, because Syft currently does not detect composer. The official composer dockerfile relies on a bash installer for composer, which drops a few binaries in the image. I've reported such issues in the past, but I believe the binary classifier can only get us so far.

In such a scenario, since the SBOM doesn't include it, the usage (potentially EOL) goes unnoticed and undetected.

However, if Syft were to report the base image used here (pkg:docker/library/[email protected]), it would provide a secondary means of such detection.

tl;dr: Providing base images in the SBOM acts as a decent fallback, and includes important information (such as repository names, organization name, image version/tag) that is relevant to security teams.

captn3m0 avatar Sep 28 '23 15:09 captn3m0

@captn3m0 can this issue be closed now after https://github.com/anchore/syft/issues/2267 has been merged, or did you have more in mind for this issue?

noqcks avatar Nov 09 '23 18:11 noqcks

@captn3m0 What else do you have in mind? Now that we have the annotations do we want to try and build the base image "package" into the other formats? What's your end Ideal state for syft in how it surfaces base images now that #2267 has been merged?

For best results so consumers of the document can find the base image via relationships we should use: https://spdx.github.io/spdx-spec/v2.3/relationships-between-SPDX-elements/

Cyclonedx: https://cyclonedx.org/docs/1.5/json/#metadata_component

The other outstanding question is are the annotations the best source of truth for discovering this information? Can there be multiple images that would build the full chain from image:primary -> image:base1 -> image:base2 -> scratch

The properties of the annotations also need more information to properly identify the image. ubuntu:xx.xx today can be different from ubuntu:xx.xx one month ago. We need both the digest and the version to pin down the exact image used.

spiffcs avatar Jan 18 '24 21:01 spiffcs

What's your end Ideal state for syft in how it surfaces base images

A PURL that points to the correct base image. While #2294 is great, those are not components. Anything that is outside of the "components" part of the BOM will not get picked up by any other tooling.

Ideally, this would use the OCI PURL type, with the optional tag attribute (https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#oci).

I like the idea of using relationships to document this better, but I'm not sure which of the available relationships will work best here. Base images can be counted as build dependencies, composition primitives, or even ancestors. Hard to pick something that works best for all cases.

Can there be multiple images that would build the full chain from image:primary -> image:base1 -> image:base2 -> scratch

Yes, this is another reason I'd prefer using components as well, since there the BOM could all of the known base images (although finding them is a much harder problem).

We need both the digest and the version to pin down the exact image used.

This should be solvable with oci PURLs. Sample PURL from the spec, that includes both digest and tag: pkg:oci/static@sha256%3A244fd47e07d10?repository_url=gcr.io/distroless/static&tag=latest

captn3m0 avatar Jan 19 '24 05:01 captn3m0