CH_CALLBACK_SKIP_PATTERN causes infinite loop in ch_scan() with patterns of empty matches
Hi, I recently noticed an infinite loop in our program and found the issue to be related with CH_CALLBACK_SKIP_PATTERN.
When there are patterns that may match empty strings, the CH_CALLBACK_SKIP_PATTERN return value from the callback would cause an infinite loop in ch_scan(), specifically in catchupPcre(). I've reproduced the issue with Hyperscan 5.1.1 and 5.4.0.
Below is a test program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <hs/ch.h>
static int
handler(unsigned int id,
unsigned long long from, unsigned long long to,
unsigned int flags, unsigned int size,
const ch_capture_t *captured, void *ctx)
{
(void)flags;
(void)size;
(void)captured;
(void)ctx;
printf("id=%u, from=%llu, to=%llu\n", id, from, to);
// XXX!
// With a regex matching an *empty* string, CH_CALLBACK_SKIP_PATTERN
// causes an infinite loop in catchupPcre().
// However, using CH_CALLBACK_CONTINUE is okay.
return CH_CALLBACK_SKIP_PATTERN;
}
int
main(void)
{
const char *input = "foo 0123 0 bar 39483 n34jfhlqekrcoi3q4";
// test regex matching an *empty* string !
const char *regexes[] = { "[a-z]+", "[0-9]", "0|" };
unsigned int flags[] = { CH_FLAG_CASELESS, CH_FLAG_CASELESS, CH_FLAG_CASELESS };
unsigned int ids[] = { 1, 2, 3 };
unsigned int nelem = 3;
ch_database_t *db = NULL;
ch_scratch_t *scratch = NULL;
ch_compile_error_t *cerr = NULL;
ch_error_t err;
err = ch_compile_multi(regexes, flags, ids, nelem,
CH_MODE_NOGROUPS, NULL /* platform */,
&db, &cerr);
if (err != CH_SUCCESS) {
printf("ch_compile_multi() failed\n");
exit(1);
}
err = ch_alloc_scratch(db, &scratch);
if (err != CH_SUCCESS) {
printf("ch_alloc_scratch() failed\n");
exit(1);
}
err = ch_scan(db, input, strlen(input), 0 /* flags */,
scratch, handler, NULL /* onError */, NULL);
printf("err = %d\n", (int)err);
return 0;
}
And here is the output log:
id=3, from=0, to=0
id=1, from=0, to=3
id=2, from=4, to=5
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
...
So it just keeps repeating the last id=3, from=38, to=38 ...
If return CH_CALLBACK_CONTINUE from the callback, ch_scan() works correctly.
Please have a look and hope the above test program helps.
Thank you.
Had a closer look at the catchupPcre() code, and I suspect the following conditional might be the cause:
https://github.com/intel/hyperscan/blob/master/chimera/ch_runtime.c#L338-L348
338 if (hyctx->length >= pd->scanStart &&
339 !(pattern->flags & CHIMERA_PATTERN_FLAG_SINGLEMATCH)) {
340 DEBUG_PRINTF("get a new match item\n");
341 int ret = scanPcre(hyctx, len, start, top_id);
342
343 if (ret == CH_CALLBACK_TERMINATE) {
344 DEBUG_PRINTF("user callback told us to terminate scanning\n");
345 return CH_SCAN_TERMINATED;
346 } else if (ret == CH_CALLBACK_SKIP_PATTERN) {
347 DEBUG_PRINTF("user callback told us to skip this pattern\n");
348 pd->scanStart = hyctx->length;
It looks to me the conditional at line 338 should be hyctx->length > pd->scanStart instead of >=.
Had a closer look at the
catchupPcre()code, and I suspect the following conditional might be the cause:https://github.com/intel/hyperscan/blob/master/chimera/ch_runtime.c#L338-L348
338 if (hyctx->length >= pd->scanStart && 339 !(pattern->flags & CHIMERA_PATTERN_FLAG_SINGLEMATCH)) { 340 DEBUG_PRINTF("get a new match item\n"); 341 int ret = scanPcre(hyctx, len, start, top_id); 342 343 if (ret == CH_CALLBACK_TERMINATE) { 344 DEBUG_PRINTF("user callback told us to terminate scanning\n"); 345 return CH_SCAN_TERMINATED; 346 } else if (ret == CH_CALLBACK_SKIP_PATTERN) { 347 DEBUG_PRINTF("user callback told us to skip this pattern\n"); 348 pd->scanStart = hyctx->length;It looks to me the conditional at line 338 should be
hyctx->length > pd->scanStartinstead of>=.
This change simply avoids infinite loop for CH_CALLBACK_SKIP_PATTERN flag in your case, but might result in losing the last match id=3, from=38, to=38 for CH_CALLBACK_CONTINUE flag.
We'll come with a fix soon.
Please refer to latest develop branch. Commit id: 10b52255fc821a5a40dd62fe41b2be4645828634