reuse-tool String constants with license parts in code

The following code fails during reuse lint and complains about the identifier in license_identifier variable, which is used for writing license info to another file.

# SPDX-FileCopyrightText: 2019 Example
#
# SPDX-License-Identifier: LicenseRef-Proprietary
#

license_copyright_text = "SPDX-FileCopyrightText: 2019 Example"
license_identifier = "SPDX-License-Identifier: LicenseRef-Proprietary"

fout = open('out.txt', 'w')
fout.write(license_copyright_text)
fout.write('\n')
fout.write(license_identifier)
fout.close()

However, this file already has a valid license header.

The error message is:

reuse._util - ERROR - Could not parse 'LicenseRef-Proprietary"'
reuse.project - ERROR - kek.py holds an SPDX expression that cannot be parsed, skipping the file

Sep 17 '19 12:09 rvem

It fails on this line:

license_identifier = "SPDX-License-Identifier: LicenseRef-Proprietary"

Specifically, the " character at the end.

Some context:

The tool is technically able to only search the first comment for SPDX tags. However, it doesn't do that, because files might pop up with a comment syntax that is not supported, or with some other manner of nonstandard comment header arrangement. So instead, it searches the first 4KiB of the file for tags.

This becomes annoying when tags appear inside the code, as happens a lot inside of this very repository. It is quite frustrating to work around this, but not impossible.

license_identifier = "SPDX" "-License-Identifier: LicenseRef-Proprietary"

It would probably be preferable, however, that instead of skipping the entire file when a single parse error is encountered, that it would parse whatever it can, report the parse errors, and carry on. This requires some non-trivial work on the way the tool currently error-handles, so I simply haven't bothered, because the user needs to fix the error anyway.

This error is thusly field-specific to me (only shows up in projects that manipulate SPDX tags inside of their code), that I don't know if this is worth fixing, with a host of other things to do. I'll take a pull request, of course.

Sep 20 '19 11:09 carmenbianca

Interesting, this means: resue-compliant code-generators can not generate reuse-compliant code.

Nov 01 '19 00:11 uniqx

Ah…

If you're generating code, it's not impossible to do it two-step:

Generate the code to generated.py.
reuse addheader --copyright="Jane Doe" --license="GPL-3.0-or-later" generated.py

But that's not the nicest solution.

I could add a flag along the lines of reuse --only-header lint, in which case only the header comment would be parsed for SPDX info, but this would break for all unsupported filetypes/comment styles.

I'm not sure what a good solution is.

Nov 01 '19 12:11 carmenbianca

I could add a flag along the lines of reuse --only-header lint, in which case only the header comment would be parsed for SPDX info, but this would break for all unsupported filetypes/comment styles.

Well, I guess there might be a few hints how the tool could detect whether the strings a part of the code or the header:

If code style is known: check whether it's commented
If code style is unknown: if there are at least X lines between the first (last) appearance of licensing/copyright info, assume that the newly found string is not part of the file's copyright/licensing information
If code style is unknown: check how many different characters are in front of the string. We could assume that the licensing/copyright info is within the first X columns and not hidden behind a longer string (which is not only comment characters or spaces)

Nov 04 '19 17:11 mxmehl

@mxmehl That could work, but is fairly esoteric. If someone runs into an issue with that, it's going to be incredibly difficult for them to diagnose on their own.

Maybe that behaviour could be put inside of a --prefer-header tag? Would be quite a bit of work for a very domain-specific problem, however.

There's a also super easy workaround: If generator.py contains SPDX tags inside of the code, then simply put the SPDX header for that file in generator.py.license. All contents of generator.py will then be ignored.

Nov 06 '19 12:11 carmenbianca

I'll add my request to accomodate this scenario. In multiple Khronos projects we have code and documentation generation tools that imbed licenses in the outputs, which means the SPDX-License-Identifier: tag appears in some confusing form in the generator code and results in reuse 'Could not parse' errors. I don't know what a good solution that's not intrusive on the target code would be - maybe something that recognizes common string patterns and (optionally?) ignores them - but for reference, here are a couple of the examples in our generator code which cause errors from 'reuse lint':

print('// SPDX-License-Identifier: CC-BY-4.0', file=fp)

prefixStrings = [
'** SPDX-License-Identifier: Apache-2.0',

Jun 03 '20 04:06 oddhack

Also, as there's already a mechanism for specifying licenses independently of files under .reuse, that might be a natural place to tag exceptions / guidance of this nature which reuse could respect on a per-file basis. I have the impression there may be a philosophical issue about simplicity of the tool that resists special-casing like this, but it is called for sometimes.

Jun 03 '20 04:06 oddhack

#464 will fix this.

Jan 23 '22 16:01 carmenbianca

reuse-tool reuse-tool copied to clipboard

String constants with license parts in code

reuse-tool
reuse-tool copied to clipboard