scancode-toolkit icon indicating copy to clipboard operation
scancode-toolkit copied to clipboard

LGPL-3.0-or-later wrongly identified as GPL-2.0-or-later

Open silverhook opened this issue 1 year ago • 7 comments

Description

I was scanning https://gitlab.com/tango-controls/pytango for a demo and noticed that the following text gets identified as GPL-2.0-or-later:

# -----------------------------------------------------------------------------
# This file is part of PyTango (http://pytango.rtfd.io)
#
# Copyright 2006-2012 CELLS / ALBA Synchrotron, Bellaterra, Spain
# Copyright 2013-2014 European Synchrotron Radiation Facility, Grenoble, France
#
# Distributed under the terms of the GNU Lesser General Public License,
# either version 3 of the License, or (at your option) any later version.
# See LICENSE.txt for more info.
# -----------------------------------------------------------------------------

… while it seems pretty clear to be LGPL-3.0-or-later instead.

How To Reproduce

tested with ScanCode-toolkit: v32.2.1 (both in CLI and through ScanCode.io)

scancode . --info --license --copyright --spdx-tv pytango.spdx

… and check the output


P.S. Happy to provide more useful bug reports on license mismatches, if there is a better way to do that.

silverhook avatar Sep 30 '24 17:09 silverhook

@silverhook Thanks for the report. This is excellent as it is and has all the details needed.

pombredanne avatar Oct 02 '24 08:10 pombredanne

@pombredanne, it would be great if there was a simple way (e.g. a button) in ScanCode.io’s file view where you see the license detection details (and/or a command in ScanCode toolkit) to report a license bug, where it would help file a useful issue to both 1) make it easier/faster for users to report and 2) make sure you have a report that you know is useful.

silverhook avatar Oct 02 '24 08:10 silverhook

@pombredanne, it would be great if there was a simple way (e.g. a button) in ScanCode.io’s file view where you see the license detection details (and/or a command in ScanCode toolkit) to report a license bug, where it would help file a useful issue to both 1) make it easier/faster for users to report and 2) make sure you have a report that you know is useful.

@silverhook I was thinking about exactly that yesterday after I saw your comments!

pombredanne avatar Oct 02 '24 08:10 pombredanne

And the simplest approach would likely be to use this https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/creating-an-issue#creating-an-issue-from-a-url-query , e.g. crafting a link that when opened would pre-populate a new issue with the all needed details (BUT NOT create the issue, you would have to click to save it)

pombredanne avatar Oct 02 '24 08:10 pombredanne

As for diagnostics, this can be improved but we have these in Scancode.io, using https://gitlab.com/tango-controls/pytango/-/raw/develop/tango/asyncio_executor.py?ref_type=heads as an input

  1. create the project Screenshot 2024-10-02 at 10-51-15 ScanCode io

  2. run the scan and then check the resource to view the content and highlight the detected license text by licking on "Licenses" in the upper left corner of the "viewer" tab. Hovering in the margin will show the detected license expression.

Screenshot 2024-10-02 at 10-52-07 ScanCode io tango - asyncio_executor py

  1. Or go to the detection tab to view the details of each license matches

Screenshot 2024-10-02 at 10-52-38 ScanCode io tango - asyncio_executor py

pombredanne avatar Oct 02 '24 08:10 pombredanne

And for ScanCode Toolkit,

$ wget  "https://gitlab.com/tango-controls/pytango/-/raw/develop/tango/asyncio_executor.py"
$ scancode --license --license-text --license-text-diagnostics --yaml - asyncio_executor.py

will display this YAML on screen:

files:
    -   path: asyncio_executor.py
        type: file
        detected_license_expression: gpl-2.0-plus AND unknown-license-reference
        detected_license_expression_spdx: GPL-2.0-or-later AND LicenseRef-scancode-unknown-license-reference
        license_detections:
            -   license_expression: gpl-2.0-plus AND unknown-license-reference
                license_expression_spdx: GPL-2.0-or-later AND LicenseRef-scancode-unknown-license-reference
                matches:
                    -   license_expression: gpl-2.0-plus
                        spdx_license_expression: GPL-2.0-or-later
                        from_file: asyncio_executor.py
                        start_line: 7
                        end_line: 8
                        matcher: 3-seq
                        score: '60.61'
                        matched_length: 20
                        match_coverage: '60.61'
                        rule_relevance: 100
                        rule_identifier: gpl-2.0-plus_1100.RULE
                        rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0-plus_1100.RULE
                        matched_text: |
                            # Distributed under the terms of the GNU Lesser General Public License,
                            # either version 3 of the License, or (at your option) any later version.
                        matched_text_diagnostics: |
                            Distributed under the terms of the GNU [Lesser] General Public License,
                            # [either] [version] [3] of the License, or (at your option) any later version.
                    -   license_expression: unknown-license-reference
                        spdx_license_expression: LicenseRef-scancode-unknown-license-reference
                        from_file: asyncio_executor.py
                        start_line: 9
                        end_line: 9
                        matcher: 2-aho
                        score: '90.0'
                        matched_length: 3
                        match_coverage: '100.0'
                        rule_relevance: 90
                        rule_identifier: unknown-license-reference_46.RULE
                        rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/unknown-license-reference_46.RULE
                        matched_text: '# See LICENSE.txt for more info.'
                        matched_text_diagnostics: See LICENSE.txt
                identifier: gpl_2_0_plus_and_unknown_license_reference-c06981f7-0c4f-f6d2-f69d-5f678d251cc1

Here the matched_text_diagnostics parts enclosed in [ square brackets ] were NOT matched and is a good indication of the detection problem:

matched_text_diagnostics: |
 Distributed under the terms of the GNU [Lesser] General Public License,
 # [either] [version] [3] of the License, or (at your option) any later version.

So not perfect but helpful... and this should/could be combined with an issue creation helper... and we should also update the docs with details like the comments I posted here.

pombredanne avatar Oct 02 '24 09:10 pombredanne

@pombredanne , I think your proposed approach would already improve the workflow quite a bit, yes :)

silverhook avatar Oct 04 '24 18:10 silverhook