tools
tools copied to clipboard
Match expressions for license templates fail when using white space
When there is a white space used in a pattern for the <<var
expression in a license template, it will sometimes not match.
This is due to the license text being tokenized as part of the match.
It fails specifically if the space is at the very end of text to be matched. The space will be trimmed off of the comparison text causing the failure.
A workaround is to use the optional keyword (e.g. " |" -> " ?|").
A workaround is to use the optional keyword (e.g.
|
→?|
).
Can't we fix this here (vs. in license-list-XML) by adjusting the license regexps if necessary? I'd rather not break the XML to work around a tooling issue.
Can't we fix this here (vs. in license-list-XML) by adjusting the license regexps if necessary? I'd rather not break the XML to work around a tooling issue.
I would also like to fix this in the tools, however, it looks like a major design change that would cause a significant performance problem.
If you have an idea how to fix without a major design change, please propose the details as a comment or do a PR.
If you have an idea how to fix without a major design change...
For the handful of license-list-XML cases, you could replace |
with {1}|
in regexps before compiling them. That won't always be right, but it would be right for the current license-list-XML cases.
In the case where you have white space involved in the matching, I would prefer to update the license list XML matches to include the words surrounding the white space in the match. It is more precise and efficient.
For examples, to match this -and
or this-and
you could have a pattern <alt match="this-and|this and"...
It could also be generalized to <alt match="this-and|this\sand"...
On Wed, Dec 27, 2017 at 05:12:36AM +0000, goneall wrote:
In the case where you have white space involved in the matching, I would prefer to update the license list XML matches to include the words surrounding the white space in the match. It is more precise and efficient.
More efficient, possibly (I could see this going either way). But I don't see how including non-variable text in the regular expression would make things more precise.