regex icon indicating copy to clipboard operation
regex copied to clipboard

Add an operator to get ordered alternative in the DSL

Open ngrunwald opened this issue 13 years ago • 2 comments

Alternatives in regex are not unions, the are not commutative. As clojure sets are unordered, the behavior of some regexps will be unspecified. This patch adds a | operator that respects the order of the alternatives. I don't know if the unordered set should be left for the cases when it makes no difference or removed because it can be a source of confusion.

ngrunwald avatar Jul 26 '12 17:07 ngrunwald

Hi Nils,

The least I can say is that I'm not a big fan of ordered alternatives.

As my latest commits may hint I want to make regex backed by automatons so as to be able to reason on them (eg computing prefix sets). And from the automaton representation, to serialize back to Java regex or emit clojure code I think that TNFAs laurikari.net/ville/spire2000-tnfa.pdf provides a solid foundation and should still allow for ordered alternative if that(s what you are in.

So that's for the mid term direction.

Meanwhile I'm ok to accept a patch for ordered alternatives but:

  • there"s a lot of unrelated space changes
  • I'd prefer a more dissuasive and explicit name than "|" because I don't want the user to accidentally use it without needing it.

If you are interested in taking a stab at TNFA ping me.

cgrand avatar Jul 27 '12 08:07 cgrand

Hi Christophe, Sorry about the whitespaces, my emacs config is somewhat aggressive and I did not check the diff. I agree about ordered alternates, but regexps with alternatives in which one side is a subset of the other have to be ordered from the more specific to the less to be of any use. I have looked at the paper and I agree that being able to reflect easily on the regexp would be great, if only to better compile and optimize regexps. The system of transition priority described in it is indeed sufficient to make the system deterministic and the alternatives ordered. I'll ping you outside this thread as this is probably not the best place to discuss this :)

ngrunwald avatar Jul 27 '12 14:07 ngrunwald