license-expression icon indicating copy to clipboard operation
license-expression copied to clipboard

SPDX Failing to parse license for no obvious reason

Open DamianBarabonkovQC opened this issue 1 year ago • 1 comments

Hi license-expression. I must begin that this is a great piece of software, and I'm grateful for your contributions.

I noticed a strange edge case when using the spdx license parser. The parser raises an exception when I try to parse Sleepycat License but is fine with Sleepydog License or even Sleepyca License.

Reproducible example:

SPDX_LICENSING = license_expression.get_spdx_licensing()

# ExpressionParseError: Invalid symbols sequence such as (A B) for token: "License" at position: 10
_ = SPDX_LICENSING.parse('Sleepycat License')

# Works
_ = SPDX_LICENSING.parse('Sleepydog License')
_ = SPDX_LICENSING.parse('Sleepyca License')

Relevant versions installed via conda.

python                    3.11.4          h47c9636_0_cpython    conda-forge
license-expression        30.1.1             pyhd8ed1ab_0    conda-forge

Thanks in advance!

DamianBarabonkovQC avatar Aug 29 '23 12:08 DamianBarabonkovQC

@DamianBarabonkovQC Thanks for the report. Here is what happens:

  1. Sleepycat is a known SPDX identifier
  2. Sleepyca, Sleepydog and License are not known identifiers

The basic, non-validating parsing does not validate if it recognizes nothing. If you use .validate() https://github.com/nexB/license-expression/blob/dd54f5125428fc070637b7db6ca780b2cda63ca3/src/license_expression/init.py#L754 or .parse(validate=True) https://github.com/nexB/license-expression/blob/dd54f5125428fc070637b7db6ca780b2cda63ca3/src/license_expression/init.py#L472 the expression in 2. will fail to parse too.

Somehow in the expression in 1. the SPDX "Sleepycat" is recognized and does validate further.

I reckon the behaviour is inconsistent and buggy.

  1. yields:
    raise ExpressionParseError(
license_expression.ExpressionParseError: Invalid symbols sequence such as (A B) for token: "License" at position: 10
  1. with validate yields:
>>> _ = SPDX_LICENSING.parse('Sleepyca License', validate=True)
...
    raise ExpressionError(msg)
license_expression.ExpressionError: Unknown license key(s): Sleepyca License

The first non-validated parsing failure is probably OK. The second non-failure should fail either with Unknown license key(s) or rather a Invalid symbols sequence too

pombredanne avatar Sep 04 '23 23:09 pombredanne