syft icon indicating copy to clipboard operation
syft copied to clipboard

Name and Version empty for Java package when scanning provided image

Open spiffcs opened this issue 2 years ago • 2 comments

What happened: When using syft from the tip of main for image caphill4/syft-manifest-bug:latest the following behavior was experienced:

  • Name field was blank for multiple discovered package
  • Version field was blank for multiple discovered package
  • Java Metadata provided a manifest version, but no other manifest details were made available
syft -o json caphill4/syft-manifest-bug:latest

Example package below - note there were multiple of these:

  {
   "id": "40358cd756d70d11",
   "name": "",
   "version": "",
   "type": "java-archive",
   "foundBy": "java-cataloger",
   "locations": [
    {
     "path": "/opt/asserts/api-server/enterprise-server.jar",
     "layerID": "sha256:4d8a814cf85fcbdaa2cff3f001c705392dcc05e1bf659fcaac718b84e9dfc662",
     "annotations": {
      "evidence": "primary"
     }
    }
   ],
   "licenses": [],
   "language": "java",
   "cpes": [],
   "purl": "pkg:maven/",
   "metadataType": "JavaMetadata",
   "metadata": {
    "virtualPath": "/opt/asserts/api-server/enterprise-server.jar:BOOT-INF/lib/1-555680818.jar",
    "manifest": {
     "main": {
      "Manifest-Version": "1.0"
     }
    },
    "digest": [
     {
      "algorithm": "sha1",
      "value": "4c1415ccb35494ea281446ce12463ff40263c910"
     }
    ]
   }
  },

What you expected to happen: Syft should have an option or config to eliminate packages after the fact if there is not enough identifying information.

Alternatively, the file cataloger could be enhanced to show nested jar information so this information is not lost, but instead moved from package information to file information.

Example:

syft -o json --prune caphill4/syft-manifest-bug:latest

The offending jars also had some warnings, but these seem to be related to regex matching. Packages are still being created for these jars, but given they have almost no identifying information the package is blank besides the path and virtualPath fields showing their location

[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/57-30595491.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/67-1304492339.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/67-1304492339.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/32-409283951.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/32-409283951.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/6-710714459.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/6-710714459.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/17-2051466981.jar'
......

Steps to reproduce the request:

syft -o json caphill4/syft-manifest-bug:latest

Inspect the output for the above characteristics

Anything else we need to know?: Built from syft main as of - a46d12270f1e49d9bddb8bb4c082dcec34a8e95b

Environment:

  • Output of syft version: a46d12270f1e49d9bddb8bb4c082dcec34a8e95b
  • OS (e.g: cat /etc/os-release or similar): OSX

spiffcs avatar Sep 14 '23 14:09 spiffcs

Alright - I’ve hit the end of investigating this and have this update -

Currently the behavior is correct in that syft is identifying the main parent jar enterprise-server . A package does exist in the SBOM for that main package along with the manifest information. The confusion about it possibly not existing comes from path and virtualPath fields being conflated. This might lead the user to incorrectly believe that blank information is being inserted for path=/opt/enterprise-server.jar . If we look at the virtualPath it shows that these blank entries actually come from nested jars with limited identifying information virtualPath=/opt/enterprise-server.jar:BOOT-INF/lib/1-555680818.jar

Potential solutions:

  1. A Prune option which eliminates packages in a post processing step that do not have both name and version fields. This presents some challenges in that the file cataloger by default does not account for nested jar paths. This kind of option would remove any kind of detection or representation of these nested jars leading to an incomplete SBOM

  2. While the catalogers are logically detached at the moment, I would be more in favor of the above pruning option if I knew the results showed up somewhere else on the SBOM. The files field could be enhanced via the fileCataloger to show something like below, with an option to also create a relationship to the parent package:

  {
   "id": "e82d211f6cc65681",
   "location": {
      "path": "/opt/enterprise-server.jar:BOOT-INF/lib/1-555680818.jar",
      "layerID": "sha256:4693057ce2364720d39e57e85a5b8e0bd9ac3573716237736d6470ec5b7b7230"
     }
  },

Let me know in this thread comments or thoughts :smiley: but I've put this into our backlog for future discussion during community/team sync

spiffcs avatar Sep 19 '23 17:09 spiffcs

@spiffcs I think this has come up with a couple of other users as well, so probably worth restarting the discussion on what the correct answer is for Syft here.

My (current) 2c is that a jar without metadata or any other identifiers as a java artifact is really just the same as a tar file or zip file. In terms of artifact relationships it should be treated like an archive that is recursed into by the cataloger to find other artifacts rather than an identified package. The file cataloger can gather digests etc, but the java cataloger should probably skip if it cannot identify the actual java software in the jar file.

This would also be a good candidate for any "known-unknown" classification logic, to identify to the SBOM reader that theres is content that is known to be likely a part of an application or artifact but that cannot be identified.

zhill avatar Jun 12 '24 04:06 zhill