pySigma-backend-splunk icon indicating copy to clipboard operation
pySigma-backend-splunk copied to clipboard

Request: Using the fields: key to define the values() from a |stats command in correlation searches

Open joshnck opened this issue 1 year ago • 2 comments

Currently the correlations search can only reveal the data that is included in a detection if it is part of the explicit logic of the detection or if it is part of the group-by functionality. This is a limitation of how |stats works in Splunk and if we want to include extra context for the analyst from our detection, we need to use values() or some comparable function to bring data from the log into the table.

Currently we use fields: to build a |table with detections which allows us to tell Splunk which fields are important for an analyst to investigate - this should translate further into the |stats command.

title: Example Detection
name: base_rule
date: 2024/03/26
status: experimental
author: burnsn1
description: Test Rule
logsource:
    category: process_creation
    product: windows
detection:
    susp_exec:
        process_path:
        -   'C:\Windows'
    condition: susp_exec
fields:
-  process_path
-  process_name
---
title: Multiple occurrences of base event
correlation:
    type: event_count
    rules:
        - base_rule
    group-by:
        - process_path
    timespan: 24h
    condition:
        gte: 10

which then should ideally convert to:

process_path="C:\\Windows" | table process_path,process_name

| bin _time span=24h
| stats count as event_count values(process_name) as process_name by _time process_path

| search event_count >= 10

This is further useful because the converted element then retains full information that is useful for the analysis. Otherwise, you're dropping fields that may be necessary for context.

We will need to compare the fields: values to the group-by: values to make sure the searches are valid and only listed once in the final query.

joshnck avatar Apr 17 '24 12:04 joshnck

To implement this, imo we need to work on convert_correlation_aggregation_from_template and make a variable fields available to the *_aggregation_expression templates, this variable would contain all fields listed in each referenced rule and not listed by group-by. The question then would be to decide if it should be implemented in pysigma, or in the specific backends, here in splunk backend.

  • The quicker could be to choose pysigma, and directly inject the fields in the template, but could be heavy if only splunk need this.
  • In the splunk backend, I found a way to make it work, but its a little bit dirty:
def convert_correlation_aggregation_from_template(
        self, rule, correlation_type, method, search
    ):
        # get template (a copy from super().convert_correlation_aggregation_from_template ...)
        templates = getattr(self, f"{correlation_type}_aggregation_expression")
        if templates is None:
            raise NotImplementedError(
                f"Correlation type '{correlation_type}' is not supported by backend."
            )
        # handling fields
        fields = set()
        for rl in rule.rules:
            for fld in rl.rule.fields:
                # excluding fields handled by group by
                if fld not in rule.group_by:
                    fields.add(fld)
        fields = " ".join([f"values({fld}) as {fld}" for fld in fields])
        # include fields while preserving other placeholders
        templates[method] = (
            templates[method].format(
                fields=fields, timespan="{timespan}", groupby="{groupby}"
            )
        )
        return super().convert_correlation_aggregation_from_template(
            rule, correlation_type, method, search
        )

Example of event_count template:

event_count_aggregation_expression: ClassVar[Dict[str, str]] = {
        "stats": "| bin _time span={timespan}\n| stats count as event_count {fields} by _time{groupby}",
    }

I look forward to your insight @thomaspatzke on this idea, and if its ok, on where you'd prefer the PR to be made.

arblade avatar Apr 11 '25 09:04 arblade

I prefer pySigma. From my experience additions are often also required by other backends and it shouldn't hurt.

thomaspatzke avatar Apr 11 '25 11:04 thomaspatzke