tern
tern copied to clipboard
Use SPDX license mapping for SPDX format
Describe the Feature Currently the default way to list the licenses is using the LicenseID and adding some text to it. This can be augmented my using a mapping of known interpretations of these license strings. This issue in spdx-tools-python enables this: https://github.com/spdx/tools-python/issues/106
Once resolved, use this module to map out license strings to SPDX license formats.
Tabling until we can figure out if tern can aggregate SPDX documents produced by other tools.
I am currently having the same problem, i want to import tern to dependency-track but most licenses are not detected correctly. i currently start building a license map which maps LicenseID to the appropriate spdx license field
@makefu This project was created to address this issue: https://github.com/spdx/package-licenses-mapping. It's going to take a little while to create the mappings to all known licenses. PRs welcome :)
@nishakm one issue i encountered when i started my own mapping is that a couple of LicenseIDs are not accurate enough to map to a single SPDX identifier (e.g. "gpl","lgpl+","openldap", "cc-by" or even "gplv2 with exceptions" as there are different exceptions possible). At least this is what i encountered rpm-based containers. in addition to try to map legacy entries, it may be a good idea to contact distribution systems and clean up their database to use SPDX in first place. Another option could be a license database for packages+versions to their current SPDX license.
As soon as there is some content in the repository i will consider creating PRs to add my findings :+1:
The licenses in the SPDX tag-value SBOM output currently still use custom licenses what reference a definition like these examples:
LicenseID: LicenseRef-c66410f
ExtractedText: <text>Original license: GPL-2.0-only</text>
LicenseID: LicenseRef-1eaea05
ExtractedText: <text>Original license: ISC</text>
LicenseID: LicenseRef-f266d93
ExtractedText: <text>Original license: BSD</text>
This makes it quite a challenge for automated tools to interpret package licenses from this SBOM format, although the information appears to be available in the default Tern JSON output.
Are there any plans to address this, or perhaps make life easier for automated tools that rely on SPDX input by providing the approximate SPDX license identifier as "licenseName" field?
@timovandeput
The licenses in the SPDX tag-value SBOM output currently still use custom licenses what reference a definition like these examples:
LicenseID: LicenseRef-c66410f ExtractedText: <text>Original license: GPL-2.0-only</text> LicenseID: LicenseRef-1eaea05 ExtractedText: <text>Original license: ISC</text> LicenseID: LicenseRef-f266d93 ExtractedText: <text>Original license: BSD</text>This makes it quite a challenge for automated tools to interpret package licenses from this SBOM format, although the information appears to be available in the default Tern JSON output.
Sorry, can you clarify what information "appears to be available in the default Tern JSON output" that's not available in the SPDX reports?
Are there any plans to address this, or perhaps make life easier for automated tools that rely on SPDX input by providing the approximate SPDX license identifier as "licenseName" field?
See discussion above for challenges surrounding this. A mapping needs to exist before Tern can draw conclusions about what SPDX license might correspond to the custom licenses found.
Is the example you provided an excerpt of licenses from a debian-based image by chance? Debian images get their license info using the debian-inspector library by parsing debian copyrights as debian package licenses are not available to collect using the package manager. Because these licenses are parsed from copyright text, its not always a straightforward task to translate them to SPDX licenses and this is where we see the most variance between the license text and what the corresponding SPDX license might be. However, this is also true for other base images.
Looking at the licenseName field more in the SPDX spec, it seems like this field is appropriate "if license is not on the SPDX license list" which doesn't seem right for the examples you provided because they are all licenses on the SPDX license list (with the exception of BSD, which doesn't specify a version). I think PackageLicenseDeclared is what we would aim for.
Perhaps @pombredanne can weigh in if there's plans/it's possible to map debian licesnes found via debian-inspector to SPDX licenses?
UPDATE: Looks like there's been lots of discussion on this already here: https://github.com/spdx/package-licenses-mapping/issues/1
Perhaps @pombredanne can weigh in if there's plans/it's possible to map debian licenses found via debian-inspector to SPDX licenses?
In the end we ended up dropping most mappings we were using in ScanCode, as they are in most cases not enough. In particular for Debian copyright files, where they are mostly incorrect because of the nature of these files.
See: https://github.com/nexB/scancode-toolkit/issues/1895#issuecomment-902486183 which I am repasting partially here:
- Debian copyright files: there the declared license code have no global meaning, therefore a mapping has no value. MIT may mean X11 in one copyright file, MIT/Expat in another file, of some old style MIT in yet another copyright file. Therefore the only practical solution is rather more involved than a mapping and requires parsing, coupled with detection and fine understanding of the structure of these files and this has been implemented in packagedcode/debian_copyright.py ... the only mapping is for the 10 or so common licenses and this is the most trivial part of getting things correct.