reuse-tool
                                
                                 reuse-tool copied to clipboard
                                
                                    reuse-tool copied to clipboard
                            
                            
                            
                        String constants with license parts in code
The following code fails during reuse lint and complains about the identifier in
license_identifier variable, which is used for writing license info to another file.
# SPDX-FileCopyrightText: 2019 Example
#
# SPDX-License-Identifier: LicenseRef-Proprietary
#
license_copyright_text = "SPDX-FileCopyrightText: 2019 Example"
license_identifier = "SPDX-License-Identifier: LicenseRef-Proprietary"
fout = open('out.txt', 'w')
fout.write(license_copyright_text)
fout.write('\n')
fout.write(license_identifier)
fout.close()
However, this file already has a valid license header.
The error message is:
reuse._util - ERROR - Could not parse 'LicenseRef-Proprietary"'
reuse.project - ERROR - kek.py holds an SPDX expression that cannot be parsed, skipping the file
It fails on this line:
license_identifier = "SPDX-License-Identifier: LicenseRef-Proprietary"
Specifically, the " character at the end.
Some context:
The tool is technically able to only search the first comment for SPDX tags. However, it doesn't do that, because files might pop up with a comment syntax that is not supported, or with some other manner of nonstandard comment header arrangement. So instead, it searches the first 4KiB of the file for tags.
This becomes annoying when tags appear inside the code, as happens a lot inside of this very repository. It is quite frustrating to work around this, but not impossible.
license_identifier = "SPDX" "-License-Identifier: LicenseRef-Proprietary"
It would probably be preferable, however, that instead of skipping the entire file when a single parse error is encountered, that it would parse whatever it can, report the parse errors, and carry on. This requires some non-trivial work on the way the tool currently error-handles, so I simply haven't bothered, because the user needs to fix the error anyway.
This error is thusly field-specific to me (only shows up in projects that manipulate SPDX tags inside of their code), that I don't know if this is worth fixing, with a host of other things to do. I'll take a pull request, of course.
Interesting, this means: resue-compliant code-generators can not generate reuse-compliant code.
Ah…
If you're generating code, it's not impossible to do it two-step:
- Generate the code to generated.py.
- reuse addheader --copyright="Jane Doe" --license="GPL-3.0-or-later" generated.py
But that's not the nicest solution.
I could add a flag along the lines of reuse --only-header lint, in which case only the header comment would be parsed for SPDX info, but this would break for all unsupported filetypes/comment styles.
I'm not sure what a good solution is.
I could add a flag along the lines of reuse --only-header lint, in which case only the header comment would be parsed for SPDX info, but this would break for all unsupported filetypes/comment styles.
Well, I guess there might be a few hints how the tool could detect whether the strings a part of the code or the header:
- If code style is known: check whether it's commented
- If code style is unknown: if there are at least X lines between the first (last) appearance of licensing/copyright info, assume that the newly found string is not part of the file's copyright/licensing information
- If code style is unknown: check how many different characters are in front of the string. We could assume that the licensing/copyright info is within the first X columns and not hidden behind a longer string (which is not only comment characters or spaces)
@mxmehl That could work, but is fairly esoteric. If someone runs into an issue with that, it's going to be incredibly difficult for them to diagnose on their own.
Maybe that behaviour could be put inside of a --prefer-header tag? Would be quite a bit of work for a very domain-specific problem, however.
There's a also super easy workaround: If generator.py contains SPDX tags inside of the code, then simply put the SPDX header for that file in generator.py.license. All contents of generator.py will then be ignored.
I'll add my request to accomodate this scenario. In multiple Khronos projects we have code and documentation generation tools that imbed licenses in the outputs, which means the SPDX-License-Identifier: tag appears in some confusing form in the generator code and results in reuse 'Could not parse' errors. I don't know what a good solution that's not intrusive on the target code would be - maybe something that recognizes common string patterns and (optionally?) ignores them - but for reference, here are a couple of the examples in our generator code which cause errors from 'reuse lint':
print('// SPDX-License-Identifier: CC-BY-4.0', file=fp)
prefixStrings = [
'** SPDX-License-Identifier: Apache-2.0',
Also, as there's already a mechanism for specifying licenses independently of files under .reuse, that might be a natural place to tag exceptions / guidance of this nature which reuse could respect on a per-file basis. I have the impression there may be a philosophical issue about simplicity of the tool that resists special-casing like this, but it is called for sometimes.
#464 will fix this.