reuse-tool icon indicating copy to clipboard operation
reuse-tool copied to clipboard

Find license identifiers in comments with ASCII art frames

Open pietroalbini opened this issue 3 years ago • 1 comments

The Rust project is working to adopt REUSE to annotate licenses, but when working on the initial implementation I stumbled on #343. The Rust repository has LLVM as one of its submodules, and REUSE currently errors out on some of the LLVM comments:

reuse._util - ERROR - Could not parse 'Apache-2.0 WITH LLVM-exception                    *|'
reuse.project - ERROR - 'src/llvm-project/lldb/source/Plugins/Plugins.def.in' holds an SPDX expression that cannot be parsed, skipping the file                                                                      

As #343 correctly pointed out, the problem is that LLVM uses ASCII art "frames" for those code comments, like:

/***********************************************************\
|* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception *|
\***********************************************************/

The solution I came up with is to implement generic handling for these kinds of ASCII art, even in cases where the multiline delimiter (in this case *) is not present. When there are some non-whitespace chars before SPDX-License-Identifier, the new code tries to strip the reverse of them from the end of the line. That correctly handles LLVM comments, but also any other ASCII art frame that's symmetric.

Fixes #343

pietroalbini avatar Jul 18 '22 14:07 pietroalbini

Thanks for the quick review! I should've addressed everything.

pietroalbini avatar Jul 18 '22 15:07 pietroalbini

Hey all, thanks for the reviews! What are the next steps to get this merged and released?

pietroalbini avatar Aug 18 '22 09:08 pietroalbini

@pietroalbini Sorry for the delay. This gets merged now for the next release, which shouldn't be too long away.

carmenbianca avatar Sep 22 '22 12:09 carmenbianca