LGPL-3.0-or-later wrongly identified as GPL-2.0-or-later
Description
I was scanning https://gitlab.com/tango-controls/pytango for a demo and noticed that the following text gets identified as GPL-2.0-or-later:
# -----------------------------------------------------------------------------
# This file is part of PyTango (http://pytango.rtfd.io)
#
# Copyright 2006-2012 CELLS / ALBA Synchrotron, Bellaterra, Spain
# Copyright 2013-2014 European Synchrotron Radiation Facility, Grenoble, France
#
# Distributed under the terms of the GNU Lesser General Public License,
# either version 3 of the License, or (at your option) any later version.
# See LICENSE.txt for more info.
# -----------------------------------------------------------------------------
… while it seems pretty clear to be LGPL-3.0-or-later instead.
How To Reproduce
tested with ScanCode-toolkit: v32.2.1 (both in CLI and through ScanCode.io)
scancode . --info --license --copyright --spdx-tv pytango.spdx
… and check the output
P.S. Happy to provide more useful bug reports on license mismatches, if there is a better way to do that.
@silverhook Thanks for the report. This is excellent as it is and has all the details needed.
@pombredanne, it would be great if there was a simple way (e.g. a button) in ScanCode.io’s file view where you see the license detection details (and/or a command in ScanCode toolkit) to report a license bug, where it would help file a useful issue to both 1) make it easier/faster for users to report and 2) make sure you have a report that you know is useful.
@pombredanne, it would be great if there was a simple way (e.g. a button) in ScanCode.io’s file view where you see the license detection details (and/or a command in ScanCode toolkit) to report a license bug, where it would help file a useful issue to both 1) make it easier/faster for users to report and 2) make sure you have a report that you know is useful.
@silverhook I was thinking about exactly that yesterday after I saw your comments!
And the simplest approach would likely be to use this https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/creating-an-issue#creating-an-issue-from-a-url-query , e.g. crafting a link that when opened would pre-populate a new issue with the all needed details (BUT NOT create the issue, you would have to click to save it)
As for diagnostics, this can be improved but we have these in Scancode.io, using https://gitlab.com/tango-controls/pytango/-/raw/develop/tango/asyncio_executor.py?ref_type=heads as an input
-
create the project
-
run the scan and then check the resource to view the content and highlight the detected license text by licking on "Licenses" in the upper left corner of the "viewer" tab. Hovering in the margin will show the detected license expression.
- Or go to the detection tab to view the details of each license matches
And for ScanCode Toolkit,
$ wget "https://gitlab.com/tango-controls/pytango/-/raw/develop/tango/asyncio_executor.py"
$ scancode --license --license-text --license-text-diagnostics --yaml - asyncio_executor.py
will display this YAML on screen:
files:
- path: asyncio_executor.py
type: file
detected_license_expression: gpl-2.0-plus AND unknown-license-reference
detected_license_expression_spdx: GPL-2.0-or-later AND LicenseRef-scancode-unknown-license-reference
license_detections:
- license_expression: gpl-2.0-plus AND unknown-license-reference
license_expression_spdx: GPL-2.0-or-later AND LicenseRef-scancode-unknown-license-reference
matches:
- license_expression: gpl-2.0-plus
spdx_license_expression: GPL-2.0-or-later
from_file: asyncio_executor.py
start_line: 7
end_line: 8
matcher: 3-seq
score: '60.61'
matched_length: 20
match_coverage: '60.61'
rule_relevance: 100
rule_identifier: gpl-2.0-plus_1100.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0-plus_1100.RULE
matched_text: |
# Distributed under the terms of the GNU Lesser General Public License,
# either version 3 of the License, or (at your option) any later version.
matched_text_diagnostics: |
Distributed under the terms of the GNU [Lesser] General Public License,
# [either] [version] [3] of the License, or (at your option) any later version.
- license_expression: unknown-license-reference
spdx_license_expression: LicenseRef-scancode-unknown-license-reference
from_file: asyncio_executor.py
start_line: 9
end_line: 9
matcher: 2-aho
score: '90.0'
matched_length: 3
match_coverage: '100.0'
rule_relevance: 90
rule_identifier: unknown-license-reference_46.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/unknown-license-reference_46.RULE
matched_text: '# See LICENSE.txt for more info.'
matched_text_diagnostics: See LICENSE.txt
identifier: gpl_2_0_plus_and_unknown_license_reference-c06981f7-0c4f-f6d2-f69d-5f678d251cc1
Here the matched_text_diagnostics parts enclosed in [ square brackets ] were NOT matched and is a good indication of the detection problem:
matched_text_diagnostics: |
Distributed under the terms of the GNU [Lesser] General Public License,
# [either] [version] [3] of the License, or (at your option) any later version.
So not perfect but helpful... and this should/could be combined with an issue creation helper... and we should also update the docs with details like the comments I posted here.
@pombredanne , I think your proposed approach would already improve the workflow quite a bit, yes :)