re1.5
re1.5 copied to clipboard
Matching functions abort on "complex" regex'es instead of returning an error
Consider the regex (a*)*
.
Currently the recursive, recursiveloop and backtrack implementations abort on it.
(BTW, this particular infinite loop, in which the PC is incremented without eventually the SP (string pointer) incremented too, can be fixed, but this is not the point of this issue.)
The first 2 implementations (recursive and recursiveloop) abort on runtime error (stack overflow), the last one aborts on a check of a max-depth limit.
It means that, for example, a whole uPy program would abort in such a case, instead of raising an except
.
I propose to fix it (the possibility of aborting) by allowing a negative value returned from the match functions, just like pcre_exec, as following: 0 - no match positive - number of captures (this fixes another problem, not discussed yet) negative - error number (max stack depth exceeded, Unicode error, and some more error types that can happen while matching a regex).
An unknown VM instruction can be left as abort, but even this can be changed to return a negative value, as, for example, pcre_exec() does, so anything which got wrong while matching a regex will have the potential of raising an exception instead of aborting the program.
Similarly, I propose to fix re1_5_compilecode()
to return a negative value if a particular stack depth exceeds, in order to prevent stack overflow on regex'es with many nested (). I say "negative value" here too, not just -1, in order to leave a room to return other errors (like a Unicode problem).