coraza icon indicating copy to clipboard operation
coraza copied to clipboard

High memory usage with `coreruleset`

Open bkupidura opened this issue 2 years ago • 7 comments

Description

After loading all rules from coreruleset (https://coraza.io/docs/tutorials/coreruleset/), Coraza instance consumes 130MB.

From quick pprofing looks like issue is with github.com/cloudflare/ahocorasick.(*Matcher).buildTrie.

I wouldnt care, but nginx with modsecurity (modsecurity-crs docker image) is consuming 40MB with exact same ruleset.

Is there anything im missing?

Steps to reproduce

Create Coraza instance based on https://coraza.io/docs/tutorials/coreruleset/

Expected result

Much lower memory usage.

Actual result

High memory usage :)

bkupidura avatar May 25 '22 16:05 bkupidura

Hey @bkupidura, can you share the whole pprofing file? and the requests you are performing?

There are many issues with the aho corasick library, it might need replacement

jptosso avatar May 25 '22 16:05 jptosso

Can you validate that you are testing against transactions? It seems you are validating the aho-corasick initialization: image

The garbage collector should free the resources after no more than a second that the rules were compiled

jptosso avatar May 25 '22 16:05 jptosso

High memory usage is constant, so looks like GC is not freeing those resources.

Here is very simple code, which behaves exactly same way - so you can check:

package main

import (
        "log"
        "net/http"

        _ "net/http/pprof"

        "github.com/corazawaf/coraza/v2"
        "github.com/corazawaf/coraza/v2/seclang"
)

var waf *coraza.Waf

func setupCoraza() error {
        waf = coraza.NewWaf()

        seclang, err := seclang.NewParser(waf)
        if err != nil {
                return err
        }

        files := []string{
                "coraza.conf",
                "coreruleset/crs-setup.conf.example",
                "coreruleset/rules/*.conf",
        }
        for _, f := range files {
                if err := seclang.FromFile(f); err != nil {
                        return err
                }
        }
        return nil
}

func main() {
        if err := setupCoraza(); err != nil {
                panic(err)
        }
        log.Fatal(http.ListenAndServe(":8080", nil))
}

bkupidura avatar May 25 '22 16:05 bkupidura

That makes sense, we will look into it in detail. I would personally like to remove aho-corasick from the project. @piyushroshan do you think we could use a similar algorithm in a better go-way? @fzipi do you think replacing the algorithm would break compatibility with modsecurity?

jptosso avatar May 25 '22 16:05 jptosso

We can try different libraries, for example https://github.com/BobuSumisu/aho-corasick. See https://github.com/Bobusumisu/aho-corasick-benchmark.

jcchavezs avatar Jun 13 '22 10:06 jcchavezs

As mentioned, the memory consumption will be quite high compared to a double-array trie implementation. Especially during the build phase (which currently contains a lot of object allocations).

We want to lower the memory consumption during build phase

jptosso avatar Jun 13 '22 14:06 jptosso

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Jul 14 '22 03:07 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Aug 14 '22 03:08 github-actions[bot]

Fixed by replacing the aho corasick library

jptosso avatar Aug 18 '22 03:08 jptosso