stix-shifter icon indicating copy to clipboard operation
stix-shifter copied to clipboard

Mongo Aggregations or MapReduce queries

Open StephenOTT opened this issue 5 years ago • 6 comments

Would be great (unless i missed it in the docs) for a Mongo Aggregation and/or mapreduce query generation.

StephenOTT avatar Jan 10 '19 15:01 StephenOTT

@StephenOTT Can you go into more detail as to what you're looking for here?

JasonKeirstead avatar Feb 04 '19 14:02 JasonKeirstead

Shifter provides the ability to convert from the stix pattern into another format, such as going from the stix pattern to the elastic search query. Would be great if same could be done but with mongo aggregation query. So you can go from stix pattern to mongo aggregation query. (Mongo aggregation query is just another json object)

StephenOTT avatar Feb 04 '19 14:02 StephenOTT

@StephenOTT First the data format and layout for the security data living in Mongo that we're trying to go against would need to be defined. Shifter doesn't work if it doesn't understand the data... since a Mongo database can contain "anything", this is a problem.

JasonKeirstead avatar Feb 04 '19 14:02 JasonKeirstead

I think that would be fine. You are basically doing the same for elastic indexes?

StephenOTT avatar Feb 04 '19 21:02 StephenOTT

@StephenOTT Kind of, except with Elastic we have some standard schemas to target. MITRE has defined a translation to their CAR schema, and we will also be developing a translation to the ECS standard schema. The goal of Shifter is to work "out of the box" for most security products.

If there is some kind of standard schema for Mongo you have in mind that is in use in a product we would definitely look at this.

JasonKeirstead avatar Mar 22 '19 19:03 JasonKeirstead

I don't know anything about MongoDB, but some data sources have aggregations. For example in QRadar AQL you can GROUP BY and then use an aggregation function (IIUC). In STIX Observations, there is first_observed, last_observed, and number_observed, so it seems like we should be able to handle simple "count" aggregations, at least for data sources that support it.

Supporting such aggregations gives us a way to "push" some of the computational burden down the stack, and reduce the amount of data transmitted.

pcoccoli avatar Dec 04 '19 15:12 pcoccoli