license-expression
license-expression copied to clipboard
Provide built-in support for SPDX and scancode license expression validation
I would like to have a function that takes an expression string as an argument and validates this expression. It could be build from Licensing.parse() but I would prefer having it return some object that tells me everything about the expression validity:
- if the syntax is valid or not and error messages if not
- what are the valid and invalid license symbols
- what are the valid and invalid exceptions
- what are the obsolete license symbols
This function should be taking either the ScanCode license DB as an input for license symbols ( https://scancode-licensedb.aboutcode.org ) or some list of symbols. It should bundle an up-to-date licenses list from ScanCode and SPDX for easy bootstrapping. For this we need https://github.com/nexB/scancode-licensedb/issues/7
In addition it should also support and accept arbitrary LicenseRef-
(and possibly DocumentRef-
) in SPDX mode.
@thatch @JonoYang ping ^
Some example:
$ wget https://scancode-licensedb.aboutcode.org/index.json
$ python
>>> import json
>>> lics = json.load(open('index.json'))
>>> lics[0]
{'license_key': '389-exception', 'json': '389-exception.json', 'yml': '389-exception.yml', 'html': '389-exception.html', 'text': '389-exception.LICENSE'}
>>> from license_expression import LicenseSymbol, Licensing
>>> syms =[LicenseSymbol(l['license_key']) for l in lics]
>>> ling=Licensing(symbols=syms)
>>> ling.parse('foo AND mit')
AND(LicenseSymbol('foo', is_exception=False), LicenseSymbol('mit', is_exception=False))
>>> ling.parse('foo AND mit', validate=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/licexp/tmp/lib/python3.6/site-packages/license_expression/__init__.py", line 453, in parse
raise ExpressionError(msg)
license_expression.ExpressionError: Unknown license key(s): foo
>>> e=ling.parse('foo AND mit')
>>> e.symbols
{LicenseSymbol('foo', is_exception=False), LicenseSymbol('mit', is_exception=False)}
@pombredanne When we are parsing a license expression using Licensing().parse()
, should the .parse()
method be automatically able to determine whether or not an expression is an SPDX license expression or a scancode license expression or should there be a flag that tells the .parse()
method what kind of license expression to expect?
@JonoYang I think the new validation feature should be explicit about which license list is used as a base and there should be no guessing there about whether an expression is from scancode or from SPDX.
In addition to validation, could you also provide a normalized (whitespace, case, parens) version of the string passed in?