coraza
coraza copied to clipboard
High memory usage with `coreruleset`
Description
After loading all rules from coreruleset
(https://coraza.io/docs/tutorials/coreruleset/), Coraza instance consumes 130MB.
From quick pprofing looks like issue is with github.com/cloudflare/ahocorasick.(*Matcher).buildTrie
.
I wouldnt care, but nginx with modsecurity (modsecurity-crs docker image) is consuming 40MB with exact same ruleset.
Is there anything im missing?
Steps to reproduce
Create Coraza instance based on https://coraza.io/docs/tutorials/coreruleset/
Expected result
Much lower memory usage.
Actual result
High memory usage :)
Hey @bkupidura, can you share the whole pprofing file? and the requests you are performing?
There are many issues with the aho corasick library, it might need replacement
Can you validate that you are testing against transactions? It seems you are validating the aho-corasick initialization:
The garbage collector should free the resources after no more than a second that the rules were compiled
High memory usage is constant, so looks like GC is not freeing those resources.
Here is very simple code, which behaves exactly same way - so you can check:
package main
import (
"log"
"net/http"
_ "net/http/pprof"
"github.com/corazawaf/coraza/v2"
"github.com/corazawaf/coraza/v2/seclang"
)
var waf *coraza.Waf
func setupCoraza() error {
waf = coraza.NewWaf()
seclang, err := seclang.NewParser(waf)
if err != nil {
return err
}
files := []string{
"coraza.conf",
"coreruleset/crs-setup.conf.example",
"coreruleset/rules/*.conf",
}
for _, f := range files {
if err := seclang.FromFile(f); err != nil {
return err
}
}
return nil
}
func main() {
if err := setupCoraza(); err != nil {
panic(err)
}
log.Fatal(http.ListenAndServe(":8080", nil))
}
That makes sense, we will look into it in detail. I would personally like to remove aho-corasick from the project. @piyushroshan do you think we could use a similar algorithm in a better go-way? @fzipi do you think replacing the algorithm would break compatibility with modsecurity?
We can try different libraries, for example https://github.com/BobuSumisu/aho-corasick. See https://github.com/Bobusumisu/aho-corasick-benchmark.
As mentioned, the memory consumption will be quite high compared to a double-array trie implementation. Especially during the build phase (which currently contains a lot of object allocations).
We want to lower the memory consumption during build phase
This issue is stale because it has been open for 30 days with no activity.
This issue is stale because it has been open for 30 days with no activity.
Fixed by replacing the aho corasick library