syft icon indicating copy to clipboard operation
syft copied to clipboard

Add ability to see the first location a package was added

Open wagoodman opened this issue 1 year ago • 8 comments

Adds a squashed-with-all-layers resolver which acts like the squashed resolver with the additional behavior of returning instances of the path found in all other layers. This, combined with additional changes to denote the layer index directly in locations, allows for someone to be able to know the first location a package was introduced.

For example:

# Dockerfile for test:latest
FROM alpine:latest
RUN apk add wget
RUN apk add curl

When running syft...

$ syft -o json -s squashed-with-all-layers test:latest  -vvv
...
[0000] DEBUG discovered 58 packages cataloger=apkdb-cataloger
[0000] DEBUG found path duplicate of /lib/ld-musl-x86_64.so.1
[0000] DEBUG found path duplicate of /usr/share/apk/keys/[email protected]
[0000] DEBUG found path duplicate of /usr/share/apk/keys/[email protected]
[0000] DEBUG found path duplicate of /usr/share/apk/keys/[email protected]
[0000] DEBUG found path duplicate of /usr/share/apk/keys/[email protected]
...
[0000] TRACE merging similar packages id=291d1267b40d636f purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=alpine-baselayout&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=d9700f02cf26e8b8 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=623d53216342d45e purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=256fc96b4a8c4da8 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=busybox&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=92b19c7750fb559d purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=2b5e23d349b556cf purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=b805d823ae624f04 purl=pkg:apk/alpine/ca-certificates-bundle@20220614-r4?arch=x86_64&upstream=ca-certificates&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=d3084c788891fb28 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=openssl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=2a95f0251fba7a33 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=openssl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=b15247aafcd4a647 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=busybox&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=94014313cfcd2b71 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=e5f757b0df1f62bc purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=e903138d19e85b80 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=pax-utils&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=f71ecf5267e6c37b purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=musl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=8126b232e2d3c608 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=libc-dev&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=291d1267b40d636f purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=alpine-baselayout&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=d9700f02cf26e8b8 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=623d53216342d45e purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=256fc96b4a8c4da8 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=busybox&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=92b19c7750fb559d purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=2b5e23d349b556cf purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=b805d823ae624f04 purl=pkg:apk/alpine/ca-certificates-bundle@20220614-r4?arch=x86_64&upstream=ca-certificates&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=d3084c788891fb28 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=openssl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=2a95f0251fba7a33 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=openssl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=b15247aafcd4a647 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=busybox&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=94014313cfcd2b71 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=e5f757b0df1f62bc purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=e903138d19e85b80 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=pax-utils&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=f71ecf5267e6c37b purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=musl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=8126b232e2d3c608 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=libc-dev&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=58d60d9b7d1565f1 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=3841a3199a1ee118 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=e40c4f862e3949e8 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=971b42d7909ea972 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3

# proceeds to output 25 packages, not 58

You'll see merged location elements for each package:

{
  "id": "94014313cfcd2b71",
  "name": "zlib",
  "version": "1.2.13-r0",
  "type": "apk",
  "foundBy": "apkdb-cataloger",
  "locations": [
    {
      "path": "/lib/apk/db/installed",
      "layerID": "sha256:0d71e44edab1e63f802dfd59cbf8c128c4f89f2ae3c4edb79475678dcedb5bff"
    },
    {
      "path": "/lib/apk/db/installed",
      "layerID": "sha256:a2ea955c0abfa7fb734e0991ef02fb4e4f35e8090ae76cd6f14dc58d037fa23e"
    },
    {
      "path": "/lib/apk/db/installed",
      "layerID": "sha256:f1417ff83b319fbdae6dd9cd6d8c9c88002dcd75ecf6ec201c8c6894681cf2b5"
    }
  ],
  "licenses": [
    "Zlib"
  ],
  "language": "",
  "cpes": [
    "cpe:2.3:a:zlib:zlib:1.2.13-r0:*:*:*:*:*:*:*"
  ],
  "purl": "pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3",
...

TODO:

  • [ ] add tests 🧛 🩸
  • [ ] add layer index to location?
  • [ ] sort slice from location set not lexically, but by layer order.
  • [ ] there are a log of "found path duplicate of " log entries, which hints that there is an issue with relationship creation for these duplicate packages found.

Open question:

  • Should we omit packages for certain ecosystems that have been found in previous layers but are known to be the same? E.g. deb/apk/rpm packages are in a single DB, so adding any new package will make the previously installed packages look like they've been installed again, which isn't what's happening here.

Problems:

  • This will report packages that get removed and are not logically in the squashed representation (introducing FPs relative to the squashed representation).

Closes #435

wagoodman avatar Apr 07 '23 19:04 wagoodman

Benchmark Test Results

Benchmark results from the latest changes vs base branch
goos: linux%0Agoarch: amd64%0Apkg: github.com/anchore/syft/test/integration%0Acpu: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz%0A                                                          │ ./.tmp/benchmark-14e8cb4.txt │%0A                                                          │            sec/op            │%0AImagePackageCatalogers/alpmdb-cataloger-2                                   11.80m ± 24%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                             856.1µ ±  2%25%0AImagePackageCatalogers/python-package-cataloger-2                           3.097m ±  1%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                   695.8µ ±  1%25%0AImagePackageCatalogers/javascript-package-cataloger-2                       356.7µ ±  2%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                   511.1µ ±  1%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                   491.1µ ±  3%25%0AImagePackageCatalogers/java-cataloger-2                                     10.73m ±  1%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                     8.390µ ±  2%25%0AImagePackageCatalogers/apkdb-cataloger-2                                    556.0µ ±  0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                         18.95µ ±  2%25%0AImagePackageCatalogers/dotnet-deps-cataloger-2                              981.6µ ±  1%25%0AImagePackageCatalogers/portage-cataloger-2                                  344.5µ ±  1%25%0AImagePackageCatalogers/nix-store-cataloger-2                                222.9µ ±  2%25%0AImagePackageCatalogers/sbom-cataloger-2                                     110.8µ ±  0%25%0AImagePackageCatalogers/binary-cataloger-2                                   190.1µ ±  0%25%0Ageomean                                                                     451.0µ%0A%0A                                                          │ ./.tmp/benchmark-14e8cb4.txt │%0A                                                          │             B/op             │%0AImagePackageCatalogers/alpmdb-cataloger-2                                   5.064Mi ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                             123.8Ki ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                           947.4Ki ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                   155.8Ki ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                       90.79Ki ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                   144.6Ki ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                   170.2Ki ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                     2.720Mi ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                     1.555Ki ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                    129.2Ki ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                         3.133Ki ± 0%25%0AImagePackageCatalogers/dotnet-deps-cataloger-2                              314.5Ki ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                  77.23Ki ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                36.07Ki ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                     13.57Ki ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                   29.91Ki ± 0%25%0Ageomean                                                                     101.7Ki%0A%0A                                                          │ ./.tmp/benchmark-14e8cb4.txt │%0A                                                          │          allocs/op           │%0AImagePackageCatalogers/alpmdb-cataloger-2                                    86.71k ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                              2.049k ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                            15.49k ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                    3.457k ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                        1.205k ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                    2.646k ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                    3.759k ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                      38.26k ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                       40.00 ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                     3.438k ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                           101.0 ± 0%25%0AImagePackageCatalogers/dotnet-deps-cataloger-2                               5.011k ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                   1.539k ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                  671.0 ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                       392.0 ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                     872.0 ± 0%25%0Ageomean                                                                      2.062k

github-actions[bot] avatar Apr 07 '23 19:04 github-actions[bot]

May I know why this pr is not merged . Its extremely helpful in deduping the components across layers

Deep232 avatar Feb 21 '24 04:02 Deep232

do you have an eta for this addition? can be helpful

tomerse-sg avatar Mar 25 '24 14:03 tomerse-sg

Hi @tomerse-sg and @Deep232, thanks for the notes, we don't have an ETA but we will take a look and see if we can move this forward. Thank you for letting us know this would be useful for you!

tgerla avatar Mar 28 '24 20:03 tgerla

can you please elaborate about the problem you specified in the PR description? what will be the different between all layers & this mode in case of deleted packages? @tgerla @wagoodman

tomersein avatar Jul 21 '24 07:07 tomersein

I tried to run a test using an image golang 1.14 using all-layers and squashed-with-all-layers. I didn't see any difference between the jsons. can you please elaborate how do we plan to mark packages that doesn't exist in the squashed?

TimBrown1611 avatar Jul 21 '24 08:07 TimBrown1611

another question - seems this pr is based on syft 0.76.0, do you think it is possible to contribute new pr and aligned it to newest syft?

another thing - I think I've found a bug - I created this dockerfile:

# Use the alpine base image
FROM alpine:latest

# Install curl
RUN apk add --no-cache curl

# Copy the file test.txt to the container
COPY test.txt /test.txt

# Install Ruff (Python linting tool)
RUN apk add --no-cache jq

RUN apk del jq

# Install Ruff (Python linting tool)
RUN apk add --no-cache jq

RUN apk del jq


# Set a default command for the container
CMD ["sh"]

and when I scan it I do see "jq" I expect not seeing it... otherwise no diff between all-layers & squashed-with-all-layers

TimBrown1611 avatar Jul 21 '24 11:07 TimBrown1611

I think this solve the problem of the deleted package - https://github.com/anchore/syft/pull/3138 I opened a new PR since lot have change in syft let me know how to proceed further, this feature is useful :)

tomersein avatar Aug 20 '24 16:08 tomersein