compile-time-regular-expressions
compile-time-regular-expressions copied to clipboard
Valid regex does not match
Hello. I have the following regex:
#include "ctll/fixed_string.hpp"
#include "ctre.hpp"
static constexpr auto ptv_regex = ctll::fixed_string(
"^(\\d+)(?:[vV](\\d+))?[Pp][Tt][Vv](?:(\\d+)|(\\d+)([AaBbCc]?))?(!)?(?:$|\\^(.+))"
);
which I try to match using:
auto found = ctre::search<ptv_regex>( "1v2PTV1a" );
On regex101.com, this regex matches for the python, PCRE and PCRE2 regex "engines", so I believe the regex is valid and OK. With the CTRE library (version 3.6 from vcpkg and that which is available as trunk on compiler explorer today), it does not match.
Sorry that I don't have the knowledge to try and pinpoint the problem. Just thought I would report it.
Yep, here's a bug. I can reduce it to
#include "ctll/fixed_string.hpp"
#include "ctre.hpp"
static constexpr auto ptv_regex = ctll::fixed_string(
"(?:a|ab)?"
);
bool x()
{
auto found = ctre::match<ptv_regex>( "ab" );
return found;
}
https://godbolt.org/z/jqn9vrrWc
Thanks for the minimisation, it seems to me there is a deeper problem, and I'm currently don't know how to fix it. Will try to look at it soon.
It seems to related when you have a common prefix in selection inside a cycle.
I am not an expert on how this library implements matching, but it seems to me that there are two factors:
- Alternations of the form
a|abwill always try to matchabeforeab. - Cycles are implemented by repeatedly calling
evaluateon the content of the cycle:
while (less_than_or_infinite<B>(i)) {
auto inner_result = evaluate(..., ctll::list<Content..., end_cycle_mark>());
// ... match Tail if inner_result matches
}
The problem is that, at some point, the expression a|ab has to figure that it needs to match ab immediately instead of a, even if it can only match the prefix a. This cannot happen if the content of the cycle is matched as ctll::list<Content..., end_cycle_mark>, because Content has no idea that Tail comes after it and that it might need to backtrack based on Tail's ability to match. The evaluation of Content thinks the regex ends at end_cycle_mark.
I am writing this because I have also implemented compile-time regular expression matching as a hobby-project (inspired by this library, of course) and I have come across the same bug in my code. The solution that i have found involves using a callback function that calls evaluate for the rest of the expression and passing this function to the evaluation method for Content, so that, after matching Content, the same evaluate method can use the callback to check if Tail matches. It's basically a continuation-passing style approach.