syft icon indicating copy to clipboard operation
syft copied to clipboard

PackageLicenseDeclared is not generated correctly

Open muzammil786 opened this issue 3 years ago • 8 comments

What happened:

I am trying to generate sbom for my gradle project. I noticed that PackageLicenseDeclared is NONE for all packages. However, the licence information is available in the packages. I verified by running the CyclonDX tool and correctly populated the information with the licence text in the end.

Below are two outputs generated by Syft and CycloneDX for jackson-databind package I am using in my project.

From Syft:

##### Package: jackson-databind

PackageName: jackson-databind
SPDXID: SPDXRef-Package-java-archive-jackson-databind
PackageVersion: 2.11.4
PackageDownloadLocation: NOASSERTION
FilesAnalyzed: false
PackageLicenseConcluded: NONE
PackageLicenseDeclared: NONE
PackageCopyrightText: NOASSERTION
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson-databind:jackson-databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson-databind:jackson_databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson_databind:jackson-databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson_databind:jackson_databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:fasterxml:jackson-databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:fasterxml:jackson_databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson-databind:jackson:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson:jackson-databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson:jackson_databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson_databind:jackson:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:core:jackson-databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:core:jackson_databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:fasterxml:jackson:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson:jackson:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:core:jackson:2.11.4:*:*:*:*:*:*:*
ExternalRef: PACKAGE_MANAGER purl pkg:maven/com.fasterxml.jackson.core/[email protected]

From CycloneDX

PackageName: jackson-databind
SPDXID: SPDXRef-1
PackageVersion: 2.11.4
PackageDownloadLocation: NOASSERTION
FilesAnalyzed: false
PackageChecksum: SHA1: 5d9f3d441f99d721b957e3497f0a6465c764fad4
PackageChecksum: SHA256: dc64fa3907bd299f29ad6116169e583333d04404b23a0f81ed679afa8e2a2ee8
PackageChecksum: SHA384: be392d31669d87a6f76f1049ce75447f5dfd1eb94f22c11e214dc04f4e637b41040fd44a13776d04b84a0825b3423fa7
PackageChecksum: SHA512: ff8c9dfd6ce61842df1bc7443d6058ba3efa6eb0e728562fd5d09e00e7fffc22b0a041045550cec9972ef4fda769925c117e46f68aefe80842c67460550d44c1
PackageLicenseConcluded: NOASSERTION
PackageLicenseDeclared: Apache-2.0
PackageCopyrightText: NOASSERTION
External-Ref: PACKAGE-MANAGER purl pkg:maven/com.fasterxml.jackson.core/[email protected]?type=jar

What you expected to happen:

Syft should populate PackageLicenseDeclared: Apache-2.0.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: I have also used the container image for my project. It only generates licence info for base image but not for the dependencies.

Environment: Application: syft Version: 0.34.0 BuildDate: 2021-12-22T21:15:39Z GitCommit: 7f8cb0bd8068709982ef23e931e1ceaa2dfac955 GitTreeState: clean Platform: linux/amd64 GoVersion: go1.16.12 Compiler: gc

muzammil786 avatar Dec 24 '21 17:12 muzammil786

I am also seeing this issue. On investigation I found that PackageCopyrightText is always blank. Overall I found that all fields except a few are blank or wrong.

This entry is a template:

PackageName: addressable
SPDXID: SPDXRef-Package-gem-addressable
PackageVersion: 2.7.0
PackageDownloadLocation: NOASSERTION
FilesAnalyzed: false
PackageLicenseConcluded: NONE
PackageLicenseDeclared: NONE
PackageCopyrightText: NOASSERTION
ExternalRef: SECURITY cpe23Type cpe:2.3:a:addressable:addressable:2.7.0:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:ruby-lang:addressable:2.7.0:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:ruby_lang:addressable:2.7.0:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:ruby:addressable:2.7.0:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:*:addressable:2.7.0:*:*:*:*:*:*:*
ExternalRef: PACKAGE_MANAGER purl pkg:gem/[email protected]

The only real data being extracted is in these fields:

PackageName
PackageVersion
PackageType

Otherwise it seems like syft simply isn't running real code, for the most part. For example PACKAGE_MANAGER is always purl:

PACKAGE_MANAGER purl pkg:golang/../../../cwf/k8s/go

Sample output here: magma.txt

lucasgonze avatar Jan 20 '22 22:01 lucasgonze

Thanks for the report @lucasgonze and @muzammil786!

@muzammil786 would you be able to run syft again with your example where the file-contents and file-classification catalogers are enabled? You can do so by using the config provided in the README.

On my local I ran syft ubuntu:latest -o spdx where those catalogers were enabled. I got this package in the following output:

##### Package: libgcrypt20

PackageName: libgcrypt20
SPDXID: SPDXRef-Package-deb-libgcrypt20
PackageVersion: 1.8.5-5ubuntu1.1
PackageDownloadLocation: NOASSERTION
FilesAnalyzed: false
PackageLicenseConcluded: GPL-2.0
PackageLicenseDeclared: GPL-2.0
PackageCopyrightText: NOASSERTION
ExternalRef: SECURITY cpe23Type cpe:2.3:a:libgcrypt20:libgcrypt20:1.8.5-5ubuntu1.1:*:*:*:*:*:*:*
ExternalRef: PACKAGE_MANAGER purl pkg:deb/ubuntu/[email protected]?arch=amd64

There were licenses where:

PackageLicenseConcluded: NONE
PackageLicenseDeclared: NONE

But for packages that had licenses declared in the correct parts of their structure we were able to extract the information.

spiffcs avatar Jan 21 '22 14:01 spiffcs

@spiffcs It will do for your example because libgcrypt20 is an os package. The issue I raised is for the dependencies in our application. I have tried using your suggestion:

podman run -v ~/Documents/containers:/opt:z -e "SYFT_OUTPUT=spdx" -e "SYFT_CHECK_FOR_APP_UPDATE=false" -e "SYFT_FILE_CONTENTS_CATALOGER_SCOPE=all-layers" -e "SYFT_FILE_CLASSIFICATION_CATALOGER_SCOPE=all-layers" docker-registry.repo.ukdn.thalesuk/anchore/syft:v0.36.0 packages /opt/gradle-pipeline-example_1.0.0.11-RELEASE.tar But got the same result.

muzammil786 avatar Jan 24 '22 13:01 muzammil786

Hello, Any news on this issue ? Can I provide some help ?

Is there an option to avoid mapping step and write directly the original licence value (without normalization) ?

thx

dja-fr avatar Jul 13 '22 10:07 dja-fr

Hi @spiffcs - any chance this could be picked up please? Let me know if I can help.

muzammil786 avatar Sep 08 '23 16:09 muzammil786

👋 Hey @muzammil786 I'm going through some old issues and can pick this one back up this afternoon. A couple of large license updates have gone in recently so this might already be solved.

@dja-fr - for your questions below:

Is there an option to avoid mapping step and write directly the original licence value (without normalization) ?

All syft packages have licenses that are represented by the below struct. The original value read should always be value which is then attempted to be converted to a valid SPDX expression if one exists. Are you asking about a specific format where you cannot find Value?

// License represents an SPDX Expression or license value extracted from a packages metadata
// We want to ignore URLs and Location since we merge these fields across equal licenses.
// A License is a unique combination of value, expression and type, where
// its sources are always considered merged and additions to the evidence
// of where it was found and how it was sourced.
// This is different from how we treat a package since we consider package paths
// in order to distinguish if packages should be kept separate
// this is different for licenses since we're only looking for evidence
// of where a license was declared/concluded for a given package
type License struct {
	Value          string             `json:"value"`
	SPDXExpression string             `json:"spdxExpression"`
	Type           license.Type       `json:"type"`
	URLs           internal.StringSet `hash:"ignore"`
	Locations      file.LocationSet   `hash:"ignore"`
}

spiffcs avatar Sep 21 '23 16:09 spiffcs

Just to follow up on this one out since I saw it was still in review:

Case 1 Scanning the jar itself

This is the command I ran. If @muzammil786 or others on this thread have inputs where this licenses is not being discovered please list it below.

syft -o json jackson-databind-2.16.1.jar

syft-json

      "licenses": [
        {
          "value": "https://www.apache.org/licenses/LICENSE-2.0.txt",
          "spdxExpression": "",
          "type": "declared",
          "urls": [],
          "locations": [
            {
              "path": "/jackson-databind-2.16.1.jar",
              "accessPath": "/jackson-databind-2.16.1.jar",
              "annotations": {
                "evidence": "primary"
              }
            }
          ]
        }
      ],

"value": "https://www.apache.org/licenses/LICENSE-2.0.txt" ^ This value is being propogated to the other formats when the syft scans the jar itself. There might be an improvment we can make here where if syft encounters a URL for the license value it attempts to match the contents to an SPDX-ID

Case 2 - Encountering jackson-databind in the pom.xml

    {
      "id": "86ca0818a5df9609",
      "name": "jackson-databind",
      "version": "2.13.0",
      "type": "java-archive",
      "foundBy": "java-pom-cataloger",
      "locations": [
        {
          "path": "/pom.xml",
          "accessPath": "/pom.xml",
          "annotations": {
            "evidence": "primary"
          }
        }
      ],
      "licenses": [
        {
          "value": "The Apache Software License, Version 2.0",
          "spdxExpression": "",
          "type": "declared",
          "urls": [],
          "locations": []
        }
      ],

Here we're pulling the correct value, but it's still not in the SPDX-ID format.

It looks like we still have some work to do on this issue to get to the conclusive correct answer for both cases where the value should be:

Apache-2.0

The license returned from maven is still: Screenshot 2024-02-09 at 12 39 49 PM

Syft needs a few more upgrades in the java cataloger to get from this answer to the correct answer which is the spdx-id.

To get the maven license detailed above you can enable these options in the config:

java:
   maven-url: "https://repo1.maven.org/maven2"
   max-parent-recursive-depth: 5
   # enables Syft to use the network to fill in more detailed information about artifacts
   # currently this enables searching maven-url for license data
   # when running across pom.xml files that could have more information, syft will
   # explicitly search maven for license information by querying the online pom when this is true
   # this option is helpful for when the parent pom has more data,
   # that is not accessible from within the final built artifact
   use-network: true

spiffcs avatar Feb 09 '24 17:02 spiffcs

While we're now getting a value for this field (original ask was around removing no assertion), I'm going to leave this issue opened until we have a better mechanism for converting this new answer into the correct SPDX ID.

spiffcs avatar Feb 09 '24 17:02 spiffcs