scancode-toolkit icon indicating copy to clipboard operation
scancode-toolkit copied to clipboard

Multiple LicenseID in SPDX

Open vargenau opened this issue 2 years ago • 5 comments

Description

SPDX standard states that "This identifier shall be unique within the SPDX document". https://spdx.github.io/spdx-spec/v2.3/other-licensing-information-detected/

In the attached SPDX file, some license ids are reported multiple times:

grep LicenseID phpwiki.spdx.txt | sort | uniq -c
      1 LicenseID: LicenseRef-scancode-bsd-unmodified
      1 LicenseID: LicenseRef-scancode-commercial-license
      1 LicenseID: LicenseRef-scancode-free-unknown
      1 LicenseID: LicenseRef-scancode-mysql-linking-exception-2018
      5 LicenseID: LicenseRef-scancode-other-permissive
     20 LicenseID: LicenseRef-scancode-php-2.0.2
     15 LicenseID: LicenseRef-scancode-proprietary-license
      3 LicenseID: LicenseRef-scancode-public-domain
     23 LicenseID: LicenseRef-scancode-unknown-license-reference
      3 LicenseID: LicenseRef-scancode-unknown-spdx
      1 LicenseID: LicenseRef-scancode-warranty-disclaimer

How To Reproduce

svn checkout https://svn.code.sf.net/p/phpwiki/code/trunk phpwiki
./scancode -c -l -i --license-text --spdx-tv phpwiki.spdx phpwiki

Resulting SPDX file:

phpwiki.spdx.txt

System configuration

./scancode --version
ScanCode version: 32.0.0rc1
ScanCode Output Format version: 3.0.0
SPDX License list version: 3.19

Ubuntu 22.10

vargenau avatar Feb 15 '23 18:02 vargenau

The validator should now flag this. See https://github.com/spdx/spdx-java-tagvalue-store/issues/42 and https://github.com/spdx/spdx-java-tagvalue-store/pull/43

vargenau avatar Feb 27 '23 16:02 vargenau

Actually we are using an SPDX namespace for our licenses, meaning these "LicenseRef-scancode" ids are as stable as the SPDX ids themselves and should not be treated the same.

pombredanne avatar Feb 27 '23 17:02 pombredanne

Hi Philippe,

There are in fact two cases.

For LicenseRef-scancode-php-2.0.2, you have in the SPDX file 20 times the exact same text:

LicenseID: LicenseRef-scancode-php-2.0.2
LicenseName: PHP License 2.0.2
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/license/php-2.0.2.yml
</text>
ExtractedText: <text>// | This source file is subject to version 2.0 of the PHP license,       |
// | that is bundled with this package in the file LICENSE, and is        |
// | available at through the world-wide-web at                           |
// | http://www.php.net/license/2_02.txt.                                 |
// | If you did not receive a copy of the PHP license and are unable to   |
// | obtain it through the world-wide-web, please send a note to          |
// | [email protected] so we can mail you a copy immediately.               |</text>

It should be present only once. It's the definition of LicenseRef-scancode-php-2.0.2, there is no need to repeat it.

For LicenseRef-scancode-unknown-spdx, you have:

LicenseID: LicenseRef-scancode-unknown-spdx
LicenseName: Unknown SPDX license detected but not recognized
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml
</text>
ExtractedText:  * SPDX-License-Identifier: Artistic-1.0+

and also

LicenseID: LicenseRef-scancode-unknown-spdx
LicenseName: Unknown SPDX license detected but not recognized
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml
</text>
ExtractedText: * Adding SPDX-License-Identifier in PHP source files

This is not correct, you have two contradicting definitions of the same LicenseID. And you cannot know which definition relates to which file.

You should have something like:

# File

FileName: ./phpwiki/lib/HttpClient.php
SPDXID: SPDXRef-83
FileChecksum: SHA1: 99985858f0a2d539954e5bc6525892a6d6086ab9
LicenseConcluded: NOASSERTION
LicenseInfoInFile: LicenseRef-scancode-unknown-spdx-1
FileCopyrightText: <text>Copyright (c) 2003 Simon Willison, Incutio Limited
Copyright (c) 2004,2006-2007 Reini Urban
</text>
# File

FileName: ./phpwiki/locale/it/pgsrc/NoteDiRilascio
SPDXID: SPDXRef-636
FileChecksum: SHA1: 1d528511bfc1256c544321d1950fb06319ef0f9f
LicenseConcluded: NOASSERTION
LicenseInfoInFile: GPL-2.0-only
LicenseInfoInFile: LicenseRef-scancode-unknown-license-reference
LicenseInfoInFile: LicenseRef-scancode-unknown-spdx-2
FileCopyrightText: NONE
LicenseID: LicenseRef-scancode-unknown-spdx-1
LicenseName: Unknown SPDX license detected but not recognized
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml
</text>
ExtractedText:  * SPDX-License-Identifier: Artistic-1.0+

and

LicenseID: LicenseRef-scancode-unknown-spdx-2
LicenseName: Unknown SPDX license detected but not recognized
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml
</text>
ExtractedText: * Adding SPDX-License-Identifier in PHP source files

vargenau avatar Feb 27 '23 19:02 vargenau

@pombredanne what do you think about these two cases?

vargenau avatar Mar 28 '23 14:03 vargenau

Bug still present in scancode-toolkit 32.1.0

vargenau avatar Apr 11 '24 11:04 vargenau