hyperscan icon indicating copy to clipboard operation
hyperscan copied to clipboard

CH_CALLBACK_SKIP_PATTERN causes infinite loop in ch_scan() with patterns of empty matches

Open liweitianux opened this issue 3 years ago • 2 comments

Hi, I recently noticed an infinite loop in our program and found the issue to be related with CH_CALLBACK_SKIP_PATTERN.

When there are patterns that may match empty strings, the CH_CALLBACK_SKIP_PATTERN return value from the callback would cause an infinite loop in ch_scan(), specifically in catchupPcre(). I've reproduced the issue with Hyperscan 5.1.1 and 5.4.0.

Below is a test program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <hs/ch.h>


static int
handler(unsigned int id,
	unsigned long long from, unsigned long long to,
	unsigned int flags, unsigned int size,
	const ch_capture_t *captured, void *ctx)
{
	(void)flags;
	(void)size;
	(void)captured;
	(void)ctx;

	printf("id=%u, from=%llu, to=%llu\n", id, from, to);

	// XXX!
	// With a regex matching an *empty* string, CH_CALLBACK_SKIP_PATTERN
	// causes an infinite loop in catchupPcre().
	// However, using CH_CALLBACK_CONTINUE is okay.
	return CH_CALLBACK_SKIP_PATTERN;
}


int
main(void)
{
	const char *input = "foo 0123 0 bar 39483 n34jfhlqekrcoi3q4";

	// test regex matching an *empty* string !
	const char *regexes[] = { "[a-z]+", "[0-9]", "0|" };
	unsigned int flags[] = { CH_FLAG_CASELESS, CH_FLAG_CASELESS, CH_FLAG_CASELESS };
	unsigned int ids[] = { 1, 2, 3 };
	unsigned int nelem = 3;

	ch_database_t *db = NULL;
	ch_scratch_t *scratch = NULL;
	ch_compile_error_t *cerr = NULL;
	ch_error_t err;

	err = ch_compile_multi(regexes, flags, ids, nelem,
			CH_MODE_NOGROUPS, NULL /* platform */,
			&db, &cerr);
	if (err != CH_SUCCESS) {
		printf("ch_compile_multi() failed\n");
		exit(1);
	}

	err = ch_alloc_scratch(db, &scratch);
	if (err != CH_SUCCESS) {
		printf("ch_alloc_scratch() failed\n");
		exit(1);
	}

	err = ch_scan(db, input, strlen(input), 0 /* flags */,
			scratch, handler, NULL /* onError */, NULL);
	printf("err = %d\n", (int)err);

	return 0;
}

And here is the output log:

id=3, from=0, to=0
id=1, from=0, to=3
id=2, from=4, to=5
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
...

So it just keeps repeating the last id=3, from=38, to=38 ...

If return CH_CALLBACK_CONTINUE from the callback, ch_scan() works correctly.

Please have a look and hope the above test program helps.

Thank you.

liweitianux avatar Jul 13 '22 22:07 liweitianux

Had a closer look at the catchupPcre() code, and I suspect the following conditional might be the cause:

https://github.com/intel/hyperscan/blob/master/chimera/ch_runtime.c#L338-L348

338        if (hyctx->length >= pd->scanStart &&
339            !(pattern->flags & CHIMERA_PATTERN_FLAG_SINGLEMATCH)) {
340            DEBUG_PRINTF("get a new match item\n");
341            int ret = scanPcre(hyctx, len, start, top_id);
342
343            if (ret == CH_CALLBACK_TERMINATE) {
344                DEBUG_PRINTF("user callback told us to terminate scanning\n");
345                return CH_SCAN_TERMINATED;
346            } else if (ret == CH_CALLBACK_SKIP_PATTERN) {
347                DEBUG_PRINTF("user callback told us to skip this pattern\n");
348                pd->scanStart = hyctx->length;

It looks to me the conditional at line 338 should be hyctx->length > pd->scanStart instead of >=.

liweitianux avatar Jul 14 '22 02:07 liweitianux

Had a closer look at the catchupPcre() code, and I suspect the following conditional might be the cause:

https://github.com/intel/hyperscan/blob/master/chimera/ch_runtime.c#L338-L348

338        if (hyctx->length >= pd->scanStart &&
339            !(pattern->flags & CHIMERA_PATTERN_FLAG_SINGLEMATCH)) {
340            DEBUG_PRINTF("get a new match item\n");
341            int ret = scanPcre(hyctx, len, start, top_id);
342
343            if (ret == CH_CALLBACK_TERMINATE) {
344                DEBUG_PRINTF("user callback told us to terminate scanning\n");
345                return CH_SCAN_TERMINATED;
346            } else if (ret == CH_CALLBACK_SKIP_PATTERN) {
347                DEBUG_PRINTF("user callback told us to skip this pattern\n");
348                pd->scanStart = hyctx->length;

It looks to me the conditional at line 338 should be hyctx->length > pd->scanStart instead of >=.

This change simply avoids infinite loop for CH_CALLBACK_SKIP_PATTERN flag in your case, but might result in losing the last match id=3, from=38, to=38 for CH_CALLBACK_CONTINUE flag.

We'll come with a fix soon.

hongyang7 avatar Jul 20 '22 16:07 hongyang7

Please refer to latest develop branch. Commit id: 10b52255fc821a5a40dd62fe41b2be4645828634

hongyang7 avatar Oct 28 '22 08:10 hongyang7