streamalert icon indicating copy to clipboard operation
streamalert copied to clipboard

Improvement: Support JSON key wildcarding

Open ghost opened this issue 8 years ago • 3 comments

Problem

Some JSON logs have keys that are dynamic

Example: CarbonBlack's feed.storage.hit.process

  • Example key: alliance_data_<FEED_NAME>

Where <FEED_NAME> is bit9endpointvisibility, bit9earlyaccess, or one of a dozen other feeds. These feed names can grow and change over time.

Proposal

Support wildcarding, like so:

        "alliance_data_*": "string",
        "alliance_link_*": "string",
        "alliance_score_*": "string",
        "alliance_updated_*": "string"

This would need to be implemented at the schema level, meaning it should work if they keys are specified under schema, optional_top_level_keys or envelope_keys

ghost avatar May 15 '17 23:05 ghost

One issue with this approach is that writing rules will be really tricky when all the fields are unknown. As opposed to optional_top_level_keys which has a consistent schema even if certain fields exist in a record.

jacknagz avatar May 15 '17 23:05 jacknagz

@jacknagz To ensure we're on the same page, you're outlining a con, but it's a con we'll have to deal with in order to properly support this log type and others like it, correct?

ghost avatar May 16 '17 00:05 ghost

@mime-frame, @jacknagz This may be something that we want to consider defining within applicable logs in logs.json. For instance, we could use a key in the configuration map of something like "support_key_wildcards": true.

In turn, this would be something we could inspect during parsing so we only try to do wildcard parsing if a defined log type supports it. Otherwise I foresee an avoidable performance hit if we try to do magic like fnmatch for every key in every defined log.

EDIT: and we could do our best in the JSON parser to avoid forking logic too much if key wildcards are supported

ryandeivert avatar May 17 '17 22:05 ryandeivert