regex-tdfa icon indicating copy to clipboard operation
regex-tdfa copied to clipboard

The POSIX standard does not appear to allow empty regex...

Open twhitehead opened this issue 4 years ago • 3 comments

Just a quick note that a lot of the other-implementations-are-not-compliant examples appear to be about empty patterns (e.g., issues with the matching of () in (()|.)(b)).

If you read the linked to POSIX standard, however, it seems that such empty expressions are not actually valid regexs. For example, the defined extended regex grammar is

extended_reg_exp   :                      ERE_branch
                   | extended_reg_exp '|' ERE_branch
                   ;
ERE_branch         :            ERE_expression
                   | ERE_branch ERE_expression
                   ;
ERE_expression     : one_char_or_coll_elem_ERE
                   | '^'
                   | '$'
                   | '(' extended_reg_exp ')'
                   | ERE_expression ERE_dupl_symbol
                   ;
one_char_or_coll_elem_ERE  : ORD_CHAR
                   | QUOTED_CHAR
                   | '.'
                   | bracket_expression
                   ;
ERE_dupl_symbol    : '*'
                   | '+'
                   | '?'
                   | '{' DUP_COUNT               '}'
                   | '{' DUP_COUNT ','           '}'
                   | '{' DUP_COUNT ',' DUP_COUNT '}'
                   ;

from which I don't see how you can form () as it must contain a extended_reg_exp which has to consist of at least one ERE_branch which must consist of at least one ERE_expression which must have at least one character of some sort.

twhitehead avatar Oct 01 '21 15:10 twhitehead

Thanks for the report, @twhitehead!

I suppose this is a issue with the Wiki rather than with regex-tdfa, but there is no bug tracker at the Wiki. The Wiki does not seem to be actively maintained. It could make sense to move relevant parts of the Wiki into the documentation of one of the regex-* packages, but then it would likely be regex-base.

andreasabel avatar Oct 04 '21 08:10 andreasabel

FWIW, e.g. % git grep -E '()oo' works. I'm not sure what regexp engine git grep (and gnu grep) use, but it works anyhow.

phadej avatar Oct 04 '21 08:10 phadej

Feel free to close this if you want. As you said, there obviously nothing that needs to be done to the code itself. Had just happened to notice that and figured I should probably point it out. I've now added it to the talk page for the wiki entry.

twhitehead avatar Oct 06 '21 15:10 twhitehead