Onigmo
Onigmo copied to clipboard
Add API to mitigate exponential backtracking (ReDoS)
Problem
Onigmo is a backtracking regex engine. For this reason, it can sometime become exceptionally slow, in particular, with this class of inputs:
regex : ^(a+)+$
string: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
Provide a library API that can limit the amount of backtracking, so that it can exit gracefully from such expensive cases.
Steps to Reproduce
Attached is a simple script to reproduce this issue. You should be able to confirm that Onigmo hangs up with:
$ cc -o sample sample.c libonigmo.a
$ ./sample
Expected Solution
PCRE2 provides pcre2_set_match_limit()
that allows callers to set a
hard limit on recursive backtracking. If it exceeds the limit, it returns
PCRE_ERROR_MATCHLIMIT
, aborting the match evaluation.
https://www.pcre.org/current/doc/html/pcre2_set_match_limit.html
It would be desriable if Onigmo provides a similar tunable parameter as well.
Attachment (reproduciable code)
#include <string.h>
#include "onigmo.h"
int main(void)
{
unsigned char *start, *range, *end;
regex_t *reg;
OnigErrorInfo einfo;
OnigRegion *region;
UChar *pattern = (UChar *) "^(a+)+$";
UChar *str = (UChar *) "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab";
onig_new(®, pattern, pattern + strlen((char *) pattern),
ONIG_OPTION_DEFAULT, ONIG_ENCODING_ASCII, ONIG_SYNTAX_DEFAULT,
&einfo);
region = onig_region_new();
end = str + strlen((char *)str);
start = str;
range = end;
onig_search(reg, str, end, start, range, region, ONIG_OPTION_NONE);
return 0;
}
FYI: Fluentd and Fluent Bit relies heavily on onigmo, having this feature in place would be great.
You can enable such check by defining USE_COMBINATION_EXPLOSION_CHECK
in regint.h
:
https://github.com/k-takata/Onigmo/blob/a06a42b51713eeafe30a939827031c3ba79da936/regint.h#L138
@k-takata What do you think about Kosako's comment on this thread:
If USE_COMBINATION_EXPLOSION_CHECK flag in src/regint.h enabled, it solves several patterns. However, it does not solve all pattern cases and it become slow. Therefore, I was not able to use it.
https://github.com/kkos/oniguruma/issues/13#issuecomment-236370385
There is, we presume, a fair question as to how it is well trenched for production usage. Indeed, this flag seems to be deprecated in the upstream project.