scancode-toolkit
scancode-toolkit copied to clipboard
Tagging license and license rules with "SPDX matching guidelines"
It would be nice to track if a license text or license rule text is matching the "SPDX matching guidelines".
@goneall would matching each text against the SPDX XML with https://github.com/spdx/LicenseListPublisher/blob/master/src/org/spdx/licenselistpublisher/LicenseXmlTester.java be the way? Is this code implementing the guidelines alright?
would matching each text against the SPDX XML with https://github.com/spdx/LicenseListPublisher/blob/master/src/org/spdx/licenselistpublisher/LicenseXmlTester.java be the way? Is this code implementing the guidelines alright?
@pombredanne the code to implement the matching guidelines is here: https://github.com/spdx/Spdx-Java-Library/blob/master/src/main/java/org/spdx/utility/compare/LicenseCompareHelper.java
The code is rather complex due to the optional and variable text matching.
I initially tried to use regular expressions for the matching, but failed in the attempt. There has also been 2 attempts to re-implement this in python using regular expressions, but these attempts came close but did not succeed when comparing all the license texts to the listed license XML's.
It would be very beneficial if we implemented the standard matching in the tools-python library.
Although I don't have the bandwidth or Python expertise to implement this myself, I'd be happy to help support the effort.
Also, if I were to implement the license matching from scratch, I would use the XML format as input rather than the template format. The original Java code was written before the XML format was specified.
@goneall thank you ++