hyperscan icon indicating copy to clipboard operation
hyperscan copied to clipboard

SEGV hs_scan with HS_FLAG_SOM_LEFTMOST

Open noboruma opened this issue 2 years ago • 2 comments

Hello there,

I was experimenting with scratch buffer allocation and noticed that one specific pattern would cause a crash. The code I used is as follow, I have 2 similar patterns and only allocate the scratch buffer for pattern1 but not pattern2:

#include <hs/hs_compile.h>
#include <stdio.h>
#include <string.h>
#include <hs/hs.h>

static int eventHandler(unsigned int id, unsigned long long from,
                        unsigned long long to, unsigned int flags, void *ctx) {
    printf("Match for pattern \"%s\" at offset %llu\n", (char *)ctx, to);
    return 0;
}

int main(int argc, char *argv[]) {
    const char *pattern1 = "^(ftps?|https?).*?\\?+$";
    const char *pattern2 = "^(ftps?|https?)://.*$";

    hs_compile_error_t *compile_err;
    hs_database_t *database1;
    hs_compile(pattern1, HS_FLAG_SOM_LEFTMOST, HS_MODE_BLOCK, NULL, &database1,
                   &compile_err);

    hs_database_t *database2;
    hs_compile(pattern2, HS_FLAG_SOM_LEFTMOST, HS_MODE_BLOCK, NULL, &database2,
                   &compile_err);

    const char *inputData = "https://foo.bar/";
    hs_scratch_t *scratch = NULL;
    hs_alloc_scratch(database1, &scratch);
    hs_scan(database1, inputData, strlen(inputData), 0, scratch, eventHandler,
                (void*)pattern1);
    hs_scan(database2, inputData, strlen(inputData), 0, scratch, eventHandler,
                (void*)pattern2);

    return 0;
}

Expected results: the second hs_scan should return an error, since the scratch buffer was not allocated properly for that pattern Actual results: the second hs_scan is SEGV, here is the stack trace:

#0  0x00007ffff7c3b5cd in memset (__len=32, __ch=0, __dest=0x0)
    at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:71
#1  fatbit_clear (bits=0x0) at ./src/util/fatbit.h:60
#2  recordAnchoredLiteralMatch (end=8, anch_id=0, scratch=0x7fffcc00f240, t=<optimized out>)
    at ./src/rose/program_runtime.c:101
#3  roseRunProgram (t=<optimized out>, scratch=<optimized out>, programOffset=<optimized out>, som=0,
    end=<optimized out>, prog_flags=<optimized out>) at ./src/rose/program_runtime.c:2278
#4  0x00007ffff7c22fd0 in roseAnchoredCallback (start=start@entry=0, end=<optimized out>,
    id=id@entry=1888, ctx=ctx@entry=0x7fffcc00f240) at ./src/rose/match.c:215
#5  0x00007ffff7bd5a04 in doComplexReport (cached_accept_id=<synthetic pointer>,
    cached_accept_state=<synthetic pointer>, eod=0 '\000', loc=8, s=21, m=<optimized out>,
    ctxt=0x7fffcc00f240, cb=0x7ffff7c22f70 <roseAnchoredCallback>) at ./src/nfa/mcclellan.c:74
#6  mcclellanExec8_i (mode=CALLBACK_OUTPUT, c_final=0x0, single=0 '\000', ctxt=0x7fffcc00f240,
    cb=0x7ffff7c22f70 <roseAnchoredCallback>, offAdj=0, len=8, buf=0x7fffd4014580 "https://foo.bar/",
    state=<synthetic pointer>, m=<optimized out>) at ./src/nfa/mcclellan.c:502
#7  nfaExecMcClellan8_Bi (single=0 '\000', context=0x7fffcc00f240,
    length=8, buffer=0x7fffd4014580 "https://foo.bar/",
    offset=0, n=<optimized out>) at ./src/nfa/mcclellan.c:922
#8  nfaExecMcClellan8_B (n=<optimized out>, offset=0, buffer=0x7fffd4014580 "https://foo.bar/", length=8,
    cb=0x7ffff7c22f70 <roseAnchoredCallback>, context=0x7fffcc00f240) at ./src/nfa/mcclellan.c:945
#9  0x00007ffff7c0dbe1 in runAnchoredTableBlock (scratch=0x7fffcc00f240, atable=<optimized out>,
    t=0x555557cfde00) at ./src/rose/block.c:63
#10 roseBlockAnchored (scratch=0x7fffcc00f240, t=0x555557cfde00) at ./src/rose/block.c:212
#11 roseBlockExec (t=<optimized out>, scratch=<optimized out>) at ./src/rose/block.c:395
#12 0x00007ffff7b44f4f in rawBlockExec (scratch=0x7fffcc00f240, rose=0x555557cfde00)
    at ./src/runtime.c:188
#13 hs_scan (db=<optimized out>, data=<optimized out>, length=15, flags=<optimized out>,
    scratch=0x7fffcc00f240, onEvent=<optimized out>, userCtx=<optimized out>) at ./src/runtime.c:419

Note: there is no crash if I remove the HS_FLAG_SOM_LEFTMOST flag. Workaround: reallocating the scratch buffer with the second pattern solves the issue.

It sounds to me that some sanity checks are missing inside hs_scan.

noboruma avatar Mar 09 '22 01:03 noboruma

Thanks for reporting this. This is indeed a historical issue left. We'll consider to add suitable quick check for scratch validity later.

hongyang7 avatar May 31 '22 17:05 hongyang7

Please refer to latest develop branch. Commit id: 7c65cfce4e4a5c153ff995113c37570a95675417

hongyang7 avatar Oct 27 '22 18:10 hongyang7

This document recommends the allocation of one scratch space per scanning context.

Suricata relies on this performance technique. The commit for this change has caused this Suricata issue. I investigated and discovered that the validity check comparing the scratch and database crc values was detecting a mismatch since the scratch area is per-Suricata thread context but a per-database scratch area is expected.

Here's the performance tip from the documentation link: Tip A scratch space can be allocated so that it can be used with any one of a number of databases. Each concurrent scan operation (such as a thread) needs its own scratch space.

The hs_alloc_scratch() function can accept an existing scratch space and “grow” it to support scanning with another pattern database. This means that instead of allocating one scratch space for every database used by an application, one can call hs_alloc_scratch() with a pointer to the same hs_scratch_t and it will be sized appropriately for use with any of the given databases.

jlucovsky avatar Mar 05 '23 15:03 jlucovsky

@jlucovsky Yes you're right. We already notice this issue in Snort integration and Rspamd. A quick roll back patch for this particular issue will be available soon. Sorry for any inconvenience.

hongyang7 avatar Mar 05 '23 18:03 hongyang7

@jlucovsky Yes you're right. We already notice this issue in Snort integration and Rspamd. A quick roll back patch for this particular issue will be available soon. Sorry for any inconvenience.

Thanks for addressing the issue so quickly!

jlucovsky avatar Mar 05 '23 21:03 jlucovsky

@jlucovsky Please refre to latest master. fix id: https://github.com/intel/hyperscan/commit/0bf86a7c15823ee80061317a997842211558bfa4 Any questions, let us know.

hongyang7 avatar Mar 06 '23 08:03 hongyang7

Thanks. I've verified proper operation on the Suricata LTS and current (master) releases.

Thanks!

jlucovsky avatar Mar 06 '23 13:03 jlucovsky

@hongyang7 Will there be a new release with the update? Many of our users only used "released" software. Thus, could the change be in 5.4.2?

jlucovsky avatar Mar 08 '23 13:03 jlucovsky

@jlucovsky Please check for latest update.

hongyang7 avatar Apr 19 '23 09:04 hongyang7