tools icon indicating copy to clipboard operation
tools copied to clipboard

Match expressions for license templates fail when using white space

Open goneall opened this issue 7 years ago • 5 comments

When there is a white space used in a pattern for the <<var expression in a license template, it will sometimes not match.

This is due to the license text being tokenized as part of the match.

It fails specifically if the space is at the very end of text to be matched. The space will be trimmed off of the comparison text causing the failure.

A workaround is to use the optional keyword (e.g. " |" -> " ?|").

goneall avatar Dec 21 '17 23:12 goneall

A workaround is to use the optional keyword (e.g. | ?|).

Can't we fix this here (vs. in license-list-XML) by adjusting the license regexps if necessary? I'd rather not break the XML to work around a tooling issue.

wking avatar Dec 26 '17 20:12 wking

Can't we fix this here (vs. in license-list-XML) by adjusting the license regexps if necessary? I'd rather not break the XML to work around a tooling issue.

I would also like to fix this in the tools, however, it looks like a major design change that would cause a significant performance problem.

If you have an idea how to fix without a major design change, please propose the details as a comment or do a PR.

goneall avatar Dec 27 '17 03:12 goneall

If you have an idea how to fix without a major design change...

For the handful of license-list-XML cases, you could replace | with {1}| in regexps before compiling them. That won't always be right, but it would be right for the current license-list-XML cases.

wking avatar Dec 27 '17 03:12 wking

In the case where you have white space involved in the matching, I would prefer to update the license list XML matches to include the words surrounding the white space in the match. It is more precise and efficient.

For examples, to match this -and or this-and you could have a pattern <alt match="this-and|this and"... It could also be generalized to <alt match="this-and|this\sand"...

goneall avatar Dec 27 '17 05:12 goneall

On Wed, Dec 27, 2017 at 05:12:36AM +0000, goneall wrote:

In the case where you have white space involved in the matching, I would prefer to update the license list XML matches to include the words surrounding the white space in the match. It is more precise and efficient.

More efficient, possibly (I could see this going either way). But I don't see how including non-variable text in the regular expression would make things more precise.

wking avatar Dec 27 '17 05:12 wking