Onigmo icon indicating copy to clipboard operation
Onigmo copied to clipboard

Add API to mitigate exponential backtracking (ReDoS)

Open fujimotos opened this issue 4 years ago • 3 comments

Problem

Onigmo is a backtracking regex engine. For this reason, it can sometime become exceptionally slow, in particular, with this class of inputs:

regex : ^(a+)+$
string: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab

Provide a library API that can limit the amount of backtracking, so that it can exit gracefully from such expensive cases.

Steps to Reproduce

Attached is a simple script to reproduce this issue. You should be able to confirm that Onigmo hangs up with:

$ cc -o sample sample.c libonigmo.a
$ ./sample

Expected Solution

PCRE2 provides pcre2_set_match_limit() that allows callers to set a hard limit on recursive backtracking. If it exceeds the limit, it returns PCRE_ERROR_MATCHLIMIT, aborting the match evaluation.

https://www.pcre.org/current/doc/html/pcre2_set_match_limit.html

It would be desriable if Onigmo provides a similar tunable parameter as well.

Attachment (reproduciable code)

#include <string.h>
#include "onigmo.h"

int main(void)
{
  unsigned char *start, *range, *end;
  regex_t *reg;
  OnigErrorInfo einfo;
  OnigRegion *region;

  UChar *pattern = (UChar *) "^(a+)+$";
  UChar *str     = (UChar *) "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab";

  onig_new(&reg, pattern, pattern + strlen((char *) pattern),
    ONIG_OPTION_DEFAULT, ONIG_ENCODING_ASCII, ONIG_SYNTAX_DEFAULT,
    &einfo);

  region = onig_region_new();
  end   = str + strlen((char *)str);
  start = str;
  range = end;
  onig_search(reg, str, end, start, range, region, ONIG_OPTION_NONE);
  return 0;
}

fujimotos avatar Jun 25 '20 14:06 fujimotos

FYI: Fluentd and Fluent Bit relies heavily on onigmo, having this feature in place would be great.

edsiper avatar Jun 25 '20 16:06 edsiper

You can enable such check by defining USE_COMBINATION_EXPLOSION_CHECK in regint.h: https://github.com/k-takata/Onigmo/blob/a06a42b51713eeafe30a939827031c3ba79da936/regint.h#L138

k-takata avatar Jun 25 '20 23:06 k-takata

@k-takata What do you think about Kosako's comment on this thread:

If USE_COMBINATION_EXPLOSION_CHECK flag in src/regint.h enabled, it solves several patterns. However, it does not solve all pattern cases and it become slow. Therefore, I was not able to use it.

https://github.com/kkos/oniguruma/issues/13#issuecomment-236370385

There is, we presume, a fair question as to how it is well trenched for production usage. Indeed, this flag seems to be deprecated in the upstream project.

fujimotos avatar Jun 25 '20 23:06 fujimotos