Multiple LicenseID in SPDX
Description
SPDX standard states that "This identifier shall be unique within the SPDX document". https://spdx.github.io/spdx-spec/v2.3/other-licensing-information-detected/
In the attached SPDX file, some license ids are reported multiple times:
grep LicenseID phpwiki.spdx.txt | sort | uniq -c
1 LicenseID: LicenseRef-scancode-bsd-unmodified
1 LicenseID: LicenseRef-scancode-commercial-license
1 LicenseID: LicenseRef-scancode-free-unknown
1 LicenseID: LicenseRef-scancode-mysql-linking-exception-2018
5 LicenseID: LicenseRef-scancode-other-permissive
20 LicenseID: LicenseRef-scancode-php-2.0.2
15 LicenseID: LicenseRef-scancode-proprietary-license
3 LicenseID: LicenseRef-scancode-public-domain
23 LicenseID: LicenseRef-scancode-unknown-license-reference
3 LicenseID: LicenseRef-scancode-unknown-spdx
1 LicenseID: LicenseRef-scancode-warranty-disclaimer
How To Reproduce
svn checkout https://svn.code.sf.net/p/phpwiki/code/trunk phpwiki
./scancode -c -l -i --license-text --spdx-tv phpwiki.spdx phpwiki
Resulting SPDX file:
System configuration
./scancode --version
ScanCode version: 32.0.0rc1
ScanCode Output Format version: 3.0.0
SPDX License list version: 3.19
Ubuntu 22.10
The validator should now flag this. See https://github.com/spdx/spdx-java-tagvalue-store/issues/42 and https://github.com/spdx/spdx-java-tagvalue-store/pull/43
Actually we are using an SPDX namespace for our licenses, meaning these "LicenseRef-scancode" ids are as stable as the SPDX ids themselves and should not be treated the same.
Hi Philippe,
There are in fact two cases.
For LicenseRef-scancode-php-2.0.2, you have in the SPDX file 20 times the exact same text:
LicenseID: LicenseRef-scancode-php-2.0.2
LicenseName: PHP License 2.0.2
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/license/php-2.0.2.yml
</text>
ExtractedText: <text>// | This source file is subject to version 2.0 of the PHP license, |
// | that is bundled with this package in the file LICENSE, and is |
// | available at through the world-wide-web at |
// | http://www.php.net/license/2_02.txt. |
// | If you did not receive a copy of the PHP license and are unable to |
// | obtain it through the world-wide-web, please send a note to |
// | [email protected] so we can mail you a copy immediately. |</text>
It should be present only once. It's the definition of LicenseRef-scancode-php-2.0.2, there is no need to repeat it.
For LicenseRef-scancode-unknown-spdx, you have:
LicenseID: LicenseRef-scancode-unknown-spdx
LicenseName: Unknown SPDX license detected but not recognized
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml
</text>
ExtractedText: * SPDX-License-Identifier: Artistic-1.0+
and also
LicenseID: LicenseRef-scancode-unknown-spdx
LicenseName: Unknown SPDX license detected but not recognized
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml
</text>
ExtractedText: * Adding SPDX-License-Identifier in PHP source files
This is not correct, you have two contradicting definitions of the same LicenseID.
And you cannot know which definition relates to which file.
You should have something like:
# File
FileName: ./phpwiki/lib/HttpClient.php
SPDXID: SPDXRef-83
FileChecksum: SHA1: 99985858f0a2d539954e5bc6525892a6d6086ab9
LicenseConcluded: NOASSERTION
LicenseInfoInFile: LicenseRef-scancode-unknown-spdx-1
FileCopyrightText: <text>Copyright (c) 2003 Simon Willison, Incutio Limited
Copyright (c) 2004,2006-2007 Reini Urban
</text>
# File
FileName: ./phpwiki/locale/it/pgsrc/NoteDiRilascio
SPDXID: SPDXRef-636
FileChecksum: SHA1: 1d528511bfc1256c544321d1950fb06319ef0f9f
LicenseConcluded: NOASSERTION
LicenseInfoInFile: GPL-2.0-only
LicenseInfoInFile: LicenseRef-scancode-unknown-license-reference
LicenseInfoInFile: LicenseRef-scancode-unknown-spdx-2
FileCopyrightText: NONE
LicenseID: LicenseRef-scancode-unknown-spdx-1
LicenseName: Unknown SPDX license detected but not recognized
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml
</text>
ExtractedText: * SPDX-License-Identifier: Artistic-1.0+
and
LicenseID: LicenseRef-scancode-unknown-spdx-2
LicenseName: Unknown SPDX license detected but not recognized
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml
</text>
ExtractedText: * Adding SPDX-License-Identifier in PHP source files
@pombredanne what do you think about these two cases?
Bug still present in scancode-toolkit 32.1.0