fix(spdx2): Mark duplicates correctly with LicenseRef prefix
Description
The intention of deduplicateLicenseList() is de-duplicating licenses with the same SPDX ID. Duplicates are marked by adding a LicenseRef prefix and adding the MD5 hash of the license text to the shortname. Currently the sanity check for adding the prefix doesn't seem to be correct and LicenseRef might not be added. As a result duplicate licenses can end up in a generated report.
Changes
The current sanity check before adding the LicenseRef prefix doesn't seem to be correct:
if (! StringOperation::stringStartsWith(
$localList[$i + 1]->getLicenseObj()->getSpdxId(),
^^^^^^^^^^^^
LicenseRef::SPDXREF_PREFIX)) {
$newShortName = LicenseRef::SPDXREF_PREFIX .
$localList[$i + 1]->getLicenseObj()->getShortName();
The sanity check has to call getShortName() instead of getSpdxId():
if (! StringOperation::stringStartsWith(
$localList[$i + 1]->getLicenseObj()->getShortname(),
^^^^^^^^^^^^^^
LicenseRef::SPDXREF_PREFIX)) {
$newShortName = LicenseRef::SPDXREF_PREFIX .
$localList[$i + 1]->getLicenseObj()->getShortName();
Open question: Do you want to use SPDXREF_PREFIX or SPDXREF_PREFIX_FOSSOLOGY here?
How to test
- Take the latest version from master
- Manually create a potential duplicate for the report
- Create a license with the name "GPL-2.0-or-later WITH Libtool-exception"
- Create a license with the name "GPL-2.0-or-later-WITH-Libtool-exception"
- Upload and scan a package
- Conclude a couple of files with "GPL-2.0-or-later WITH Libtool-exception"
- Conclude a couple of files with "GPL-2.0-or-later-WITH-Libtool-exception"
- Create an SPDX tag value report
Without this commit the License Information section of the generated report will contain two licenses with LicenseID: LicenseRef-GPL-2.0-or-later-WITH-Libtool-exception With the change in this commit the duplicate is marked correctly.
To answer your question @JanAltenberg
Open question: Do you want to use SPDXREF_PREFIX or SPDXREF_PREFIX_FOSSOLOGY here?
SPDXREF_PREFIX is chosen as other agents like scancode can generate licenses with LicenseRef- prefix and we want to capture any license ref, not just from FOSSology and avoid situations like LicenseRef-LicenseRef-fossology-LicenseA.
To answer your question @JanAltenberg
Open question: Do you want to use SPDXREF_PREFIX or SPDXREF_PREFIX_FOSSOLOGY here?
SPDXREF_PREFIXis chosen as other agents like scancode can generate licenses withLicenseRef-prefix and we want to capture any license ref, not just from FOSSology and avoid situations likeLicenseRef-LicenseRef-fossology-LicenseA.
Makes sense. Thanks for the clarification :)
Hi @GMishx, Hi @shaheemazmalmmd,
thanks for the response and the approval so far. I just wanted to double-check if you need any further feedback or testing from my end.
Greetings Jan
@GMishx and @shaheemazmalmmd
It would be very nice, if you will merge the PR (given that everything is ok with it), because currenty the behaviour of FOSSology leads in the described scenarios to invalid SPDX documents and this requires manual postprocessing.
Thnank you very much in advance
Hi @GMishx, Hi @shaheemazmalmmd,
thanks for the response and the approval so far. I just wanted to double-check if you need any further feedback or testing from my end.
Greetings Jan
Thank you for
Hi @GMishx, Hi @shaheemazmalmmd,
thanks for the response and the approval so far. I just wanted to double-check if you need any further feedback or testing from my end.
Greetings Jan
Thanks @JanAltenberg . Merged :)