reuse-tool
reuse-tool copied to clipboard
The tool currently very poorly deals with erroneous SPDX expressions
Given a file erroneous-spdx.txt
:
SPDX-Copyright: Carmen
SPDX-License-Identifier: MIT OR BSD AND
The output of reuse lint
is:
reuse._util - ERROR - Could not parse 'MIT OR BSD AND'
reuse.project - ERROR - erroneous-spdx.txt holds an SPDX expression that cannot be parsed, skipping the file
NO LICENSE
The following files have no license(s):
erroneous-spdx.txt
NO COPYRIGHT
The following files have no copyright:
erroneous-spdx.txt
SUMMARY
Bad licenses: 0
Missing licenses: 0
Unused licenses: 0
Used licenses: Apache-2.0, CC-BY-SA-4.0, CC0-1.0, GPL-3.0-or-later
Read errors: 0
Files with copyright information: 47 / 48
Files with license information: 47 / 48
The ERROR statements are just logger output from within the program. The file is then completely skipped over, and its (completely valid) SPDX-Copyright tag is ignored.
Is this sufficient, or should the plumbing somehow change to account for this edge case?
I think it's OK to let this be an error. We should strongly discourage erroneous SPDX expressions to make reuse of software not a guesswork but easy and unambiguous.
See the stupid edge cases Thomas is dealing with in Linux for an example how things can explode to a massive rework even with minor errors ;)
Can you give me more info on what Thomas is currently dealing with? Maybe an article or e-mail I can read.
Sure. It's being discussed on the [email protected] mailing list. First post
This bug is referred to by the documentation in #80. When this bug is fixed, the documentation should reflect that.
Somewhat related: reuse addheader
allows to add any string you want as a license, e.g.
reuse addheader foobar/__init__.py --license GPLv33
Shouldn't the tool report this as an invalid license identifier and abort the operation, similar to how reuse init
behaves?
As I felt the issue raised by @bittner is quite specific, I forked it off into a separate issue.
In the example in #463, people will see the following error, even if the block ignore is implemented:
reuse._util - ERROR - Could not parse 'MIT" > file.txt'
reuse.project - ERROR - 'foobar.sh' holds an SPDX expression that cannot be parsed, skipping the file
The suggestion is to make this error more understandable and solvable for users:
- Collect these errors, and only display them near the summary block
- Combine and explain these errors in a better fashion, e.g. "The files contain text strings that confuse the REUSE tool. It cannot reliably understand what's the actual license and/or copyright. Please see $URL for an explanation and solution."
- Create the FAQ item ($URL) that explains the source of problem, and that people shall wrap the problematic lines in the block ignores (#463).