tools-python icon indicating copy to clipboard operation
tools-python copied to clipboard

Question regarding license_expression_parser behavior

Open billie-alsup opened this issue 8 months ago • 2 comments

src/spdx_tools/spdx/parser/jsonlikedict/license_expression_parser.py uses License().parse(expr) directly, rather than get_spdx_licensing().parse(expr) as used in parser/tagvalue/parser.py. The difference results in a different LicenseSymbol for GPl-2.0, e.g.

>>> from license_expression import Licensing
>>> Licensing().parse('GPL-2.0')
LicenseSymbol('GPL-2.0', is_exception=False)
>>> from license_expression import get_spdx_licensing
>>> get_spdx_licensing().parse('GPL-2.0')
LicenseSymbol('GPL-2.0-only', aliases=('GPL-2.0', 'GPL 2.0', 'LicenseRef-GPL-2.0'), is_exception=False)
>>> 

As you can see, GPL-2.0-only is the official name, and GPL-2.0 is an alias. However, when parsing directly with Licensing(), we get a GPL-2.0 node, rather than a GPL-2.0-only node. This causes problem later in validation, when GPL-2.0 comes back as an invalid symbol, e.g.

2024-06-18 16:28:31,476:WARNING:root: Unrecognized license reference: GPL-2.0. license_expression must only use IDs from the license list or extracted licensing info, but is: GPL-2.0
2024-06-18 16:28:31,476:WARNING:root: ValidationContext(spdx_id=None, parent_id='SPDXRef-base-files-Package-base-files', element_type=<SpdxElementType.LICENSE_EXPRESSION: 1>, full_element=LicenseSymbol('GPL-2.0', is_exception=False))

I'm wondering if this is expected behavior (and you do not with to allow aliases), or if this is a bug. Should I filter my json file in advance to switch to GPL-2.0-only ? Certainly GPL-2.0 should not be listed in the extracted_licensing_info section (as that would require changing it to LicenseRef-GPL-2.0 or similar), right?

billie-alsup avatar Jun 19 '24 00:06 billie-alsup