feat: dpkg license improvement for non SPDX licenses
What happened: Sometimes syft can encounter a dpkg license where the regular expression used to match on contents cannot correctly identify the license.
In the following example we should find things like:
NVIDIA Software License Agreement and CUDA Supplement to Software License Agreement
Reads contents of copyright: https://github.com/anchore/syft/blob/ca945d16e0949a41aa8786f55d21908242b224c8/syft/pkg/cataloger/debian/package.go#L252-L276
Sends contents for parsing
https://github.com/anchore/syft/blob/ca945d16e0949a41aa8786f55d21908242b224c8/syft/pkg/cataloger/debian/package.go#L101-L106
Searches for license clause
https://github.com/anchore/syft/blob/48f1e975f05183390d7c01718865f5f66e3f9012/syft/pkg/cataloger/debian/parse_copyright.go#L22-L41
What you expected to happen: Given a copyright file is found SOME license information should be created for a given package. No licenses is a bug.
Steps to reproduce the issue:
syft -o json nvidia/cuda:12.5.1-cudnn-runtime-ubuntu20.04 | grant list -o json | jq -r '.results[]
| [.license.license_id, .license.name] | @csv' | sed 's/"//g'
- Output of
syft version: devel (tip of main) - OS (e.g:
cat /etc/os-releaseor similar): OSX
I've tracked down a couple data sources syft could use to identify non SPDX licenses - currently looking at ways to incorporate these to the licenses identification when generating the SBOM
https://github.com/nexB/scancode-toolkit https://github.com/nexB/scancode-licensedb
If you'd like a simplified solution to include custom licenses, you might want to take a look here: https://github.com/HeyeOpenSource/syft/tree/Custom_Licenses 😁
N.B.: I just ran make test on it without any failures.
Reopening this as #3412 and #3876 don't solve this for all cases. Now that both of those are in we need a more precise change that addresses this for the dpkg cataloger.