Spdx-Java-Library
Spdx-Java-Library copied to clipboard
Official GPL-2.0 license text not recognized
For the license text https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt I get the following:
System.out.println(Arrays.toString(LicenseCompareHelper.matchingStandardLicenseIds(licenseText)));
System.out.println(LicenseCompareHelper.matchingStandardLicenseIdsWithinText(licenseText));
outputs
[]
[GPL-2.0, GPL-2.0-or-later, GPL-2.0-only]
The two outputs should be the same since the GPL-2.0 license spans the whole file. Tested with version 1.1.11 This problem is similar to 217
PR #236 includes unit tests that reproduce this problem, albeit with other license texts - the issue is not limited to just GPL-2.0, or indeed even just GPL family licenses.
See also #234.
I figured out a problem that could explain this case. I think the tokenization does not work properly. Example:
String license1 = "<one";
String template1 = "<<beginOptional>><<<endOptional>>one";
String license2 = "< one";
System.out.println("template1, license1: " + LicenseCompareHelper.isTextMatchingTemplate(template1, license1).getDifferenceMessage());
System.out.println("template1, license2: " + LicenseCompareHelper.isTextMatchingTemplate(template1, license2).getDifferenceMessage());
Returns
template1, license1: Normal text of license does not match at end of text when comparing to template text "one
". Last optional text was not found due to the optional difference:
Normal text of license does not match at end of text when comparing to template text "<"
template1, license2: No difference found
When I debug I see that for the first case in org.spdx.utility.compare.CompareTemplateOutputHandler.compareText the matchTokens parameter is ["<one"]. I think it should instead be ["<", "one"] like in the second case.
Also if I remove all < and > from the https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt text (
gpl-2.0-removed-angle-brackets.txt
) or if I add a space before and after every < and > (
gpl-2.0-spaces-between-angle-brackets-and-text.txt
) I get the following result for the code in the issue description:
[GPL-2.0, GPL-2.0-only]
[GPL-2.0, GPL-2.0-or-later, GPL-2.0-only]
Thanks @sdheh for the analysis! I agree, the tokenization is the issue. I'm still working on the 3.0 update, so I won't have much time over the next week or so to look for a fix, but if you want to create a pull request I can review / merge.
I ran into this issue as well and believe it affects several templates where the optional text would not be tokenized separately. For the above case I did experiment with the following change but am still learning the code base and am unclear on the total impact:
+++ b/src/main/java/org/spdx/utility/compare/CompareTemplateOutputHandler.java
@@ -247,7 +247,13 @@ public class CompareTemplateOutputHandler implements
nextToken = sub.match(matchTokens, nextToken, endToken, originalText,
optionalDifference, tokenToLocation);
if (nextToken < 0) {
- if (!ignoreOptionalDifferences) {
+ // Handle optional text tokenized with license text to handle single char optional text
+ String optionalText = sub.getText();
+ String matchText = LicenseTextHelper.getTokenAt(matchTokens, startToken);
+ if (matchText != null && optionalText != null && matchText.length() > optionalText.length() && matchText.startsWith(optionalText)) {
+ // remove optional text from start of match token
+ matchTokens[startToken] = matchText.substring(optionalText.length());
+ } else if (!ignoreOptionalDifferences) {
setLastOptionalDifference(optionalDifference);
}
return startToken; // the optional text didn't match, just return the start token
Modifying the parsed tokens feels wrong but could not sort out an easy way to adjust the tokenization of the license text being matched to the template.
@douglasclarke this was fixed in PR #249, which is merged but awaiting release.
@pmonks thanks. I believe I am testing with the latest code on master and still failing to get this case to work:
String template = "<<beginOptional>><<<endOptional>>one";
String sample = "<one";
DifferenceDescription diff = LicenseCompareHelper.isTextMatchingTemplate(template, sample);
@pmonks thanks. I believe I am testing with the latest code on master and still failing to get this case to work:
String template = "<<beginOptional>><<<endOptional>>one"; String sample = "<one"; DifferenceDescription diff = LicenseCompareHelper.isTextMatchingTemplate(template, sample);
Thanks @douglasclarke for reporting this along with the specific example.
This may be a separate issue - let's open a new issue and close this one as resolved since the prior examples seem to be fixed with the latest.
Just a head's up that this license list XML issue will also cause matching of (recent) GPL-2.0 texts to fail, but not because of anything wrong in Spdx-Java-Library. I only mention it as sometimes it can be time-consuming to discern whether a matching bug is in the code, the templates, or (as in this case) a breaking change to the official license text by the license publisher.
This will be resolved once https://github.com/spdx/license-list-XML/pull/2632 is merged. Merging the PR depends on generating a new version of the license list publisher - which I'm hoping to get out in the next week or so.