syft
syft copied to clipboard
PackageLicenseDeclared is not generated correctly
What happened:
I am trying to generate sbom for my gradle project. I noticed that PackageLicenseDeclared is NONE for all packages. However, the licence information is available in the packages. I verified by running the CyclonDX tool and correctly populated the information with the licence text in the end.
Below are two outputs generated by Syft and CycloneDX for jackson-databind package I am using in my project.
From Syft:
##### Package: jackson-databind
PackageName: jackson-databind
SPDXID: SPDXRef-Package-java-archive-jackson-databind
PackageVersion: 2.11.4
PackageDownloadLocation: NOASSERTION
FilesAnalyzed: false
PackageLicenseConcluded: NONE
PackageLicenseDeclared: NONE
PackageCopyrightText: NOASSERTION
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson-databind:jackson-databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson-databind:jackson_databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson_databind:jackson-databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson_databind:jackson_databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:fasterxml:jackson-databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:fasterxml:jackson_databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson-databind:jackson:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson:jackson-databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson:jackson_databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson_databind:jackson:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:core:jackson-databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:core:jackson_databind:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:fasterxml:jackson:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:jackson:jackson:2.11.4:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:core:jackson:2.11.4:*:*:*:*:*:*:*
ExternalRef: PACKAGE_MANAGER purl pkg:maven/com.fasterxml.jackson.core/[email protected]
From CycloneDX
PackageName: jackson-databind
SPDXID: SPDXRef-1
PackageVersion: 2.11.4
PackageDownloadLocation: NOASSERTION
FilesAnalyzed: false
PackageChecksum: SHA1: 5d9f3d441f99d721b957e3497f0a6465c764fad4
PackageChecksum: SHA256: dc64fa3907bd299f29ad6116169e583333d04404b23a0f81ed679afa8e2a2ee8
PackageChecksum: SHA384: be392d31669d87a6f76f1049ce75447f5dfd1eb94f22c11e214dc04f4e637b41040fd44a13776d04b84a0825b3423fa7
PackageChecksum: SHA512: ff8c9dfd6ce61842df1bc7443d6058ba3efa6eb0e728562fd5d09e00e7fffc22b0a041045550cec9972ef4fda769925c117e46f68aefe80842c67460550d44c1
PackageLicenseConcluded: NOASSERTION
PackageLicenseDeclared: Apache-2.0
PackageCopyrightText: NOASSERTION
External-Ref: PACKAGE-MANAGER purl pkg:maven/com.fasterxml.jackson.core/[email protected]?type=jar
What you expected to happen:
Syft should populate PackageLicenseDeclared: Apache-2.0.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: I have also used the container image for my project. It only generates licence info for base image but not for the dependencies.
Environment: Application: syft Version: 0.34.0 BuildDate: 2021-12-22T21:15:39Z GitCommit: 7f8cb0bd8068709982ef23e931e1ceaa2dfac955 GitTreeState: clean Platform: linux/amd64 GoVersion: go1.16.12 Compiler: gc
I am also seeing this issue. On investigation I found that PackageCopyrightText is always blank. Overall I found that all fields except a few are blank or wrong.
This entry is a template:
PackageName: addressable
SPDXID: SPDXRef-Package-gem-addressable
PackageVersion: 2.7.0
PackageDownloadLocation: NOASSERTION
FilesAnalyzed: false
PackageLicenseConcluded: NONE
PackageLicenseDeclared: NONE
PackageCopyrightText: NOASSERTION
ExternalRef: SECURITY cpe23Type cpe:2.3:a:addressable:addressable:2.7.0:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:ruby-lang:addressable:2.7.0:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:ruby_lang:addressable:2.7.0:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:ruby:addressable:2.7.0:*:*:*:*:*:*:*
ExternalRef: SECURITY cpe23Type cpe:2.3:a:*:addressable:2.7.0:*:*:*:*:*:*:*
ExternalRef: PACKAGE_MANAGER purl pkg:gem/[email protected]
The only real data being extracted is in these fields:
PackageName
PackageVersion
PackageType
Otherwise it seems like syft simply isn't running real code, for the most part. For example PACKAGE_MANAGER is always purl:
PACKAGE_MANAGER purl pkg:golang/../../../cwf/k8s/go
Sample output here: magma.txt
Thanks for the report @lucasgonze and @muzammil786!
@muzammil786 would you be able to run syft again with your example where the file-contents
and file-classification
catalogers are enabled? You can do so by using the config provided in the README
.
On my local I ran syft ubuntu:latest -o spdx
where those catalogers were enabled. I got this package in the following output:
##### Package: libgcrypt20
PackageName: libgcrypt20
SPDXID: SPDXRef-Package-deb-libgcrypt20
PackageVersion: 1.8.5-5ubuntu1.1
PackageDownloadLocation: NOASSERTION
FilesAnalyzed: false
PackageLicenseConcluded: GPL-2.0
PackageLicenseDeclared: GPL-2.0
PackageCopyrightText: NOASSERTION
ExternalRef: SECURITY cpe23Type cpe:2.3:a:libgcrypt20:libgcrypt20:1.8.5-5ubuntu1.1:*:*:*:*:*:*:*
ExternalRef: PACKAGE_MANAGER purl pkg:deb/ubuntu/[email protected]?arch=amd64
There were licenses where:
PackageLicenseConcluded: NONE
PackageLicenseDeclared: NONE
But for packages that had licenses declared in the correct parts of their structure we were able to extract the information.
@spiffcs It will do for your example because libgcrypt20 is an os package. The issue I raised is for the dependencies in our application. I have tried using your suggestion:
podman run -v ~/Documents/containers:/opt:z -e "SYFT_OUTPUT=spdx" -e "SYFT_CHECK_FOR_APP_UPDATE=false" -e "SYFT_FILE_CONTENTS_CATALOGER_SCOPE=all-layers" -e "SYFT_FILE_CLASSIFICATION_CATALOGER_SCOPE=all-layers" docker-registry.repo.ukdn.thalesuk/anchore/syft:v0.36.0 packages /opt/gradle-pipeline-example_1.0.0.11-RELEASE.tar
But got the same result.
Hello, Any news on this issue ? Can I provide some help ?
Is there an option to avoid mapping step and write directly the original licence value (without normalization) ?
thx
Hi @spiffcs - any chance this could be picked up please? Let me know if I can help.
👋 Hey @muzammil786 I'm going through some old issues and can pick this one back up this afternoon. A couple of large license updates have gone in recently so this might already be solved.
@dja-fr - for your questions below:
Is there an option to avoid mapping step and write directly the original licence value (without normalization) ?
All syft packages have licenses that are represented by the below struct. The original value read should always be value
which is then attempted to be converted to a valid SPDX expression if one exists. Are you asking about a specific format where you cannot find Value
?
// License represents an SPDX Expression or license value extracted from a packages metadata
// We want to ignore URLs and Location since we merge these fields across equal licenses.
// A License is a unique combination of value, expression and type, where
// its sources are always considered merged and additions to the evidence
// of where it was found and how it was sourced.
// This is different from how we treat a package since we consider package paths
// in order to distinguish if packages should be kept separate
// this is different for licenses since we're only looking for evidence
// of where a license was declared/concluded for a given package
type License struct {
Value string `json:"value"`
SPDXExpression string `json:"spdxExpression"`
Type license.Type `json:"type"`
URLs internal.StringSet `hash:"ignore"`
Locations file.LocationSet `hash:"ignore"`
}
Just to follow up on this one out since I saw it was still in review:
Case 1 Scanning the jar itself
This is the command I ran. If @muzammil786 or others on this thread have inputs where this licenses is not being discovered please list it below.
syft -o json jackson-databind-2.16.1.jar
syft-json
"licenses": [
{
"value": "https://www.apache.org/licenses/LICENSE-2.0.txt",
"spdxExpression": "",
"type": "declared",
"urls": [],
"locations": [
{
"path": "/jackson-databind-2.16.1.jar",
"accessPath": "/jackson-databind-2.16.1.jar",
"annotations": {
"evidence": "primary"
}
}
]
}
],
"value": "https://www.apache.org/licenses/LICENSE-2.0.txt" ^ This value is being propogated to the other formats when the syft scans the jar itself. There might be an improvment we can make here where if syft encounters a URL for the license value it attempts to match the contents to an SPDX-ID
Case 2 - Encountering jackson-databind
in the pom.xml
{
"id": "86ca0818a5df9609",
"name": "jackson-databind",
"version": "2.13.0",
"type": "java-archive",
"foundBy": "java-pom-cataloger",
"locations": [
{
"path": "/pom.xml",
"accessPath": "/pom.xml",
"annotations": {
"evidence": "primary"
}
}
],
"licenses": [
{
"value": "The Apache Software License, Version 2.0",
"spdxExpression": "",
"type": "declared",
"urls": [],
"locations": []
}
],
Here we're pulling the correct value, but it's still not in the SPDX-ID
format.
It looks like we still have some work to do on this issue to get to the conclusive correct answer for both cases where the value should be:
Apache-2.0
The license returned from maven is still:
Syft needs a few more upgrades in the java cataloger to get from this answer to the correct answer which is the spdx-id.
To get the maven license detailed above you can enable these options in the config:
java:
maven-url: "https://repo1.maven.org/maven2"
max-parent-recursive-depth: 5
# enables Syft to use the network to fill in more detailed information about artifacts
# currently this enables searching maven-url for license data
# when running across pom.xml files that could have more information, syft will
# explicitly search maven for license information by querying the online pom when this is true
# this option is helpful for when the parent pom has more data,
# that is not accessible from within the final built artifact
use-network: true
While we're now getting a value for this field (original ask was around removing no assertion), I'm going to leave this issue opened until we have a better mechanism for converting this new answer into the correct SPDX ID.