streamalert
streamalert copied to clipboard
Improvement: Support JSON key wildcarding
Problem
Some JSON logs have keys that are dynamic
Example: CarbonBlack's feed.storage.hit.process
- Example key: alliance_data_<FEED_NAME>
Where <FEED_NAME> is bit9endpointvisibility, bit9earlyaccess, or one of a dozen other feeds. These feed names can grow and change over time.
Proposal
Support wildcarding, like so:
"alliance_data_*": "string",
"alliance_link_*": "string",
"alliance_score_*": "string",
"alliance_updated_*": "string"
This would need to be implemented at the schema level, meaning it should work if they keys are specified under schema, optional_top_level_keys or envelope_keys
One issue with this approach is that writing rules will be really tricky when all the fields are unknown. As opposed to optional_top_level_keys which has a consistent schema even if certain fields exist in a record.
@jacknagz To ensure we're on the same page, you're outlining a con, but it's a con we'll have to deal with in order to properly support this log type and others like it, correct?
@mime-frame, @jacknagz This may be something that we want to consider defining within applicable logs in logs.json. For instance, we could use a key in the configuration map of something like "support_key_wildcards": true.
In turn, this would be something we could inspect during parsing so we only try to do wildcard parsing if a defined log type supports it. Otherwise I foresee an avoidable performance hit if we try to do magic like fnmatch for every key in every defined log.
EDIT: and we could do our best in the JSON parser to avoid forking logic too much if key wildcards are supported