netrisk icon indicating copy to clipboard operation
netrisk copied to clipboard

Ignite netrisk with HyperLogLog?

Open RubieV opened this issue 9 years ago • 3 comments

@markharwood,

Great work on significant terms, maybe even greater visualization of the 4 strategies in your comment!

Working in the same space, yet having access to more detailed data, we have found input carnality to be higher related with attack impact than relative volume, if one vector has to be chosen.

Would love to commit doing a PR for you, yet to demo that in your project the dummy data has to include uri/ua hashes or values.

By the way, do we see you in Berlin at GOTO?

Ruben

RubieV avatar May 04 '16 01:05 RubieV

Thanks for your comments and PR https://github.com/markharwood/netrisk/pull/2 - I just merged it.

we have found input carnality to be higher related with attack impact than relative volume

I'm guessing you mean "cardinality" of something here? Possibly UAs per subnet? Some time ago I sketched out the typical entities in web interactions and the expected cardinality of each of their relations with reasons for exceptions: cardinalities This is clearly a more complex model for assessing risk and implementing it requires a different approach (see "entity centric indexing") which is beyond the scope of this project.

I don't have any plans to develop this "netrisk" project beyond it's current simple form - it was built to support a blog post showing how some of the elasticsearch aggregations can be applied in practice. Feel free to fork it of course if you find it useful :)

markharwood avatar May 04 '16 08:05 markharwood

Sorry, cardinality is indeed what I meant.

We actually use entity centric views, yet currently don cache it back to ES. Would this be the same as ES backed cache for event sourcing?

It's funny you mention this example, as we've hacked together a browser plugin that ships fingerprints to ES, where application and webserver logs reside. Using sign-terms and a graph it's pretty powerful on very diverse datasets. Depending on cluster size, high cardinality fields might as well be used instead of significant terms for cached performance.

Do you happen to know a entity centric indexing / event sourcing framework that both supports ETL (per single event) and ES backed aggregations(historic aggregated events)?

Browser shipper: https://git.bitsensor.io/ruben/browser/blob/master/src/index.js

RubieV avatar May 04 '16 14:05 RubieV

Do you happen to know a entity centric indexing / event sourcing framework that both supports ETL (per single event) and ES backed aggregations(historic aggregated events)?

http://snowplowanalytics.com/product/ is a big project in this area with "trackers" for a variety of client platforms.

markharwood avatar May 04 '16 16:05 markharwood