Three problems about regular expressions
Hi, I have three questions about regular expressions,
-
What is the difference between
re. from_ecma2020andre.from.str? -
Does OSTRICH currently support regular expressions with backreferences? E.g.,
(.)\1. -
I found that
(?=a).|c->(model (define-fun w () String "a")), but((?=a).|c)->(model (define-fun w () String "\u{0}")), but\u0000is not accepted by((?=a).|c).
Hi, I have three questions about regular expressions,
Hi! Since nobody's responded to you I'll do my best.
- What is the difference between
re. from_ecma2020andre.from.str?
re.from_str should either convert to/from textbook regular expressions (i.e. not the full ECMA standard), or it's a different kind of parser. Essentially what you want is from_ecma2020, IIRC (@uuverifiers can fill you in!), it should be strictly better.
- Does OSTRICH currently support regular expressions with backreferences? E.g.,
(.)\1.
Not in general since they in the general case make the language not regular and therefore cannot be handled in Ostrich, but @RiccardoDeMasellis and @uuverifiers are working on accommodating some practical use cases. I don't remember how much of that is in the main tree yet though.
- I found that
(?=a).|c->(model (define-fun w () String "a")), but((?=a).|c)->(model (define-fun w () String "\u{0}")), but\u0000is not accepted by((?=a).|c).
That sounds like a bug to me. I know we have had some problems with look-arounds in general so this might be another one.
Sorry for the delay!
Concerning 2), handling back-references is not something we are currently working on; adding support for back-references is one of our long-term goals though.
Concerning 3), I get a correct model for this pattern when using the function re.from_ecma2020. It is true, however, that we are still working on full support for anchors and lookarounds, currently there are still some cases of ECMAScript regular expressions not handled correctly.
Thank you for your answers. @amandasystems @pruemmer
Supporting backreferences is really difficult since it makes the regex language not regular.
For question 3, I wonder if, assuming that the regex is parsed well, i.e., L(r) is converted to an equivalent SMT constraint S. When I solve S using OSTRICH, is its model always consistent with L(r)?
Again, thank you for your works. :)
There are still some cases involving anchors or lookarounds that are not handled correctly, even with the re.from_ecma2020 function. This is something we are currently working on, and will hopefully be able to resolve soon!