ast icon indicating copy to clipboard operation
ast copied to clipboard

functions fmtmatch() and fmtre() to convert between glob and ERE patterns are buggy

Open krader1961 opened this issue 4 years ago • 1 comments

I recently added many more API tests of the fmtre() and fmtmatch() functions. Primarily to see if I could improve test coverage to acceptable levels. However, the API tests don't actually verify that the patterns are equivalent; only that the conversion produces the expected, but possibly incorrect, output. So I just added (not yet commited) tests to src/cmd/ksh93/tests/sh_match.sh to verify the patterns produce the same results when used in a shell script. Not surprisingly some patterns are incorrectly converted:

<E> sh_match[450]: pattern #13 ere |^x\!y$| and glob |x\!y| produce diff output
<E> sh_match[450]: pattern #20 ere |^x|y$| and glob |@(x|y)| produce diff output
<E> sh_match[450]: pattern #31 ere |^x\a|b$| and glob |@(xa|b)| produce diff output
<E> sh_match[493]: pattern #13 ere |^x\!y$| and glob |x!y| produce diff output
<E> sh_match[493]: pattern #21 ere |^x|y$| and glob |x|y| produce diff output
<E> sh_match[493]: pattern #34 ere |^z.?a$| and glob |~(E)z.?a| produce diff output
<E> sh_match[493]: pattern #38 ere |^x\a|b$| and glob |x\a|b| produce diff output

Notice that most of the failures involve patterns that use the alternation, |, operator. Consider the ERE ^x|y$. That means match lines that begin with x or end with y. But the glob produced by fmtmatch() is @(x|y). That means match lines that are one char in length where that char is x or y. In other words the generated glob is actually equivalent to the ERE ^(x|y)$.

These functions are cool but I would never trust their output and therefore would never use them. If we were evaluating whether they should be added to the project I would have voted no. Sadly, even though it is highly unlikely any ksh script is using this feature we can't afford to break backward compatibility by removing it.

krader1961 avatar Jul 28 '19 01:07 krader1961

P.S., The way to use these in a script is to do something like this:

glob=$(print -f '%P' -- "$ere")
[[ "$line" = $glob ]] && print matches
ere=$(print -f '%R' -- "$glob")
[[ "$line" =~ $ere ]] && print matches

krader1961 avatar Jul 28 '19 01:07 krader1961