sigma
sigma copied to clipboard
X-Pack watcher aggregations count distinct logic
I think the watcher aggregations output is incorrect for the count distinct case. Though I'm not sure if it's in a general case or just in this scenario. I could also be misunderstanding how it works and trying to misuse it.
Background: I did a password spray against 6 users. This generated 6 Windows events of EID 4625 and I'm attempting a detection rule that is very similar to https://github.com/SigmaHQ/sigma/blob/08ca62cc8860f4660e945805d0dd615ce75258c1/rules/windows/builtin/win_susp_failed_logons_single_source.yml
In the Sigma condition I have this: selection | count(User) by SourceIp > 5. The selection isn't super relevant here, just the aggregation expression after the |. This generates the following aggregations in the watcher. Note this is two aggregations, one nested in the other, and they both have a min_doc_count of 6.
"aggs": {
"by": {
"terms": {
"field": "source.ip",
"size": 10,
"order": {
"_count": "desc"
},
"min_doc_count": 6
},
"aggs": {
"agg": {
"terms": {
"field": "user.name",
"size": 10,
"order": {
"_count": "desc"
},
"min_doc_count": 6
}
}
}
}
}
Here is the result when the aggregations are run. Note that the top-level aggregation correctly identifies the IP address that all 6 login attempts came from. But it fails to find any results for the nested aggregation due to the min_doc_count set to 6. There is only a single failed logon event for each unique user, which is a sign of a password spray.
"aggregations" : {
"by" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "x.x.x.204",
"doc_count" : 6,
"agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
]
}
}
This is the generated condition in the watcher. Note that it is looking in the first IP of the top-level aggregation and checking if the nested aggregation's first item has more than 6 documents. This would only happen if there were 6 failed logon events from the same IP address for the same username. Then those 6 events would be aggregated into that username's bucket.
"condition": {
"compare": {
"ctx.payload.aggregations.by.buckets.0.agg.buckets.0.doc_count": {
"gt": 5
}
}
}
But going back to the Sigma query, this is not what I intended at all. Sigma's documentation on the count() aggregation says:
The count aggregation counts all matching events if no field name is given. With field name it counts the distinct values in this field.
It seems that count(User) by SourceIp > 5 should be counting the unique usernames from each IP address and not the number of repeated usernames from each IP address.
When I remove the min_doc_count from the username aggregation, this is the result. Note that each user gets its own bucket with the single event related to that user.
"aggregations" : {
"by" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "x.x.x.204",
"doc_count" : 6,
"agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "user1",
"doc_count" : 1
},
{
"key" : "user2",
"doc_count" : 1
},
{
"key" : "user3",
"doc_count" : 1
},
{
"key" : "user4",
"doc_count" : 1
},
{
"key" : "user5",
"doc_count" : 1
},
{
"key" : "user6",
"doc_count" : 1
}
]
}
}
]
}
}
IMO, the correct thing that the watcher should be doing is comparing the length, or number of buckets, in that username aggregation because that will be the number of unique usernames.
Or maybe using something like the cardinality aggregation instead of the terms aggregation. Also, it doesn't look like setting the min_doc_count in the nested aggregation works in this case.
The XPackWatcherBackend code here is responsible for generating this aggregation. @thomaspatzke it looks like you may have written this part?
https://github.com/SigmaHQ/sigma/blob/08ca62cc8860f4660e945805d0dd615ce75258c1/tools/sigma/backends/elasticsearch.py#L773-L807
The Elastalert and ElasticDSL backends both appear to be using cardinality for the count distinct case.
- https://github.com/SigmaHQ/sigma/blob/08ca62cc8860f4660e945805d0dd615ce75258c1/tools/sigma/backends/elasticsearch.py#L1074
- https://github.com/SigmaHQ/sigma/blob/08ca62cc8860f4660e945805d0dd615ce75258c1/tools/sigma/backends/elasticsearch.py#L488-L509
Possibly related:
- https://github.com/SigmaHQ/sigma/issues/653
Another thought: Would it be feasible in the long run to have the Watcher backend use the ElasticDSL backend similar to how the ElastalertBackendDsl does?
https://github.com/SigmaHQ/sigma/blob/08ca62cc8860f4660e945805d0dd615ce75258c1/tools/sigma/backends/elasticsearch.py#L1191-L1202