kibana icon indicating copy to clipboard operation
kibana copied to clipboard

[Security Solution] Allow users to edit max_signals field for custom rules

Open approksiu opened this issue 1 year ago • 9 comments

Epics: https://github.com/elastic/security-team/issues/1974 (internal), https://github.com/elastic/kibana/issues/174168 Related bug: https://github.com/elastic/kibana/issues/164234

Summary

We need to expose the max signals field in the rule details, and rule edit page. Check the validity of the field (NOTE: To avoid rule failures, do not set the max_signals value higher than the value of xpack.alerting.rules.run.alerts.max. Provide default value (100 - according to the docs).

Background

We want to allow users more customisation options for their custom rules. Max_signals field sets the maximum number of alerts that will be created during a single rule execution.

User story

  • [x] As a user, I want to be able to specify max_signals fields for my custom rules.

Acceptance criteria

  • [x] Rule edit page shows the max_signals field in the About Rule section under advanced settings.
  • [x] User can edit the max_signals value on the rule edit page.
  • [ ] On the rule edit page, there is a tooltip explaining what this field does, (and a link to documentation?).
  • [x] Rule details page shows the max_signals field in the About Rule section.
  • [ ] When creating a new rule, the max_signals filed default value is set to 100.
  • [x] If the user sets value higher than xpack.alerting.rules.run.alerts.max - a warning is shown, but the rule can be saved.
  • [x] If the user sets value higher than xpack.alerting.rules.run.alerts.max - a warning is thrown during rule execution and the lower value of the two will be used for an upper limit.

Release progress

  • [x] Initial implementation is done. We're going to release this enhancement in a single PR, so it's going to be open until everything is ready for release. (PR)
  • [x] Feature is covered with automated tests. (PR)
  • [x] Feature is fully implemented and considered by the development team as ready to be released.
  • [ ] Acceptance testing is done and the feature is approved by @approksiu and @ARWNightingale.
  • [ ] Exploratory testing is done and the feature is approved by @vgomez-el.
  • [x] Documentation is written for ESS and Serverless by @joepeeples. Two docs PRs are approved and ready to be merged. (ticket)
  • [ ] Feature is released in Serverless.

Planned release date in Serverless: TBD. Planned release date in ESS: TBD (v8.15.0).

approksiu avatar Dec 19 '23 09:12 approksiu

TODO:

  • find out how (and if it is possible) to expose the xpack.alerting.rules.run.alerts.max setting to the frontend. Alternatively, consider throwing an error/warning only onSave

Update: This value is exposed here and has to be accesible in the rules client by adding it to:

  1. As a new parameter in the plugin initialization, similarly to how it's done for other props like minimumScheduleInterval: https://github.com/elastic/kibana/blob/main/x-pack/plugins/alerting/server/plugin.ts#L508
  2. In the RuleClient factory, added to the initialize and create methods (where the RulesClient class is instantiated): https://github.com/elastic/kibana/blob/main/x-pack/plugins/alerting/server/rules_client_factory.ts
  3. In the RuleClient class, passed to the context in the constructor: https://github.com/elastic/kibana/blob/main/x-pack/plugins/alerting/server/rules_client/rules_client.ts#L116

And that will make it available in the context object within methods such as update.

jpdjere avatar Jan 05 '24 12:01 jpdjere

@approksiu @ARWNightingale

I don't think we really need designs for this (maybe for link to docs, and explanation in tooltip?), but I think we can agree on using the Number field from EUI:

image

jpdjere avatar Jan 10 '24 11:01 jpdjere

Pinging @elastic/security-detections-response (Team:Detections and Resp)

elasticmachine avatar Mar 07 '24 14:03 elasticmachine

Pinging @elastic/security-solution (Team: SecuritySolution)

elasticmachine avatar Mar 07 '24 14:03 elasticmachine

Deployment links have been added to the PR in the latest build message for acceptance and exploratory testing. cc @vgomez-el @approksiu

dplumlee avatar Apr 03 '24 14:04 dplumlee

All of our logic changes are now (hopefully) complete and the tickets/acceptance criteria have been updated to match. The new deployment links should be added to the latest build message. Would appreciate some eyes when you have a chance, thanks! cc: @vgomez-el @approksiu @ARWNightingale

dplumlee avatar Apr 16 '24 19:04 dplumlee

The latest deployment links are in the most recent build message. Would appreciate some eyes for acceptance/exploratory testing when everyone has a chance, thanks! cc: @vgomez-el @approksiu @ARWNightingale

dplumlee avatar Apr 26 '24 01:04 dplumlee

@dplumlee I did a quick testing in the ESS Cloud deployment and found a few issues:


First, an attempt to reset max_signals to an empty value (undefined) on the Rule Editing page leads to a 400 error:

Screenshot 2024-04-26 at 12 21 35 Screenshot 2024-04-26 at 12 21 45

Request body sent to the PUT /api/detection_engine/rules endpoint:

{"type":"query","index":["apm-*-transaction*","auditbeat-*","endgame-*","filebeat-*","logs-*","packetbeat-*","traces-apm*","winlogbeat-*","-*elastic-cloud-logs-*"],"filters":[],"language":"kuery","query":"*","author":[],"exceptions_list":[],"false_positives":[],"references":[],"risk_score":21,"risk_score_mapping":[],"severity":"low","severity_mapping":[],"threat":[],"max_signals":"","name":"Georgii's test","description":"-","tags":[],"setup":"","license":"","interval":"5m","from":"now-360s","to":"now","meta":{"from":"1m","kibana_siem_app_url":"https://kibana-pr-179680.kb.us-west2.gcp.elastic-cloud.com:9243/app/security"},"actions":[],"enabled":false,"id":"929893cf-5021-44cb-8b99-01dea17108d3"}
"max_signals":""

Steps to reproduce:

  1. Create a new rule with default settings, leave max_signals equal to 100 in the form.
  2. On the Rule Details page, click "Edit rule settings".
  3. In About -> Advanced settings, clear the "Max alerts per run" input.
  4. Click "Save changes".

Same eventually happens on rule creation, but the form allows to proceed from the About rule step to the next steps without showing any validation errors:

  1. Open the Rule Creation page.
  2. Fill in Custom query, click Continue.
  3. In the About rule step, fill in Name and Description, expand Advanced settings.
  4. Clear the "Max alerts per run" input, click Continue.
  5. The page allows you to proceed to the next step, click Continue, and click Save.
  6. You see the same 400 error.

I'd expect the field to be optional in the UI, and defaultable on the BE side (if max_signals is undefined in the request body, we set it to 100).


Secondly, in the AC we have:

On the rule edit page, there is a tooltip explaining what this field does, (and a link to documentation?).

Have we decided to not add this tooltip?


Screenshot 2024-04-26 at 12 39 11

Nit: In this warning, we should probably say The rule will only write a maximum of 1000 alerts per rule run..

Also not sure about using the max_signals field name in the text, but I don't know if we use any other field names in our status messages or logs. We could replace "max_signals value" with "Max alerts per run setting".

banderror avatar Apr 26 '24 10:04 banderror

@banderror This has been fixed in the most recent build, a leftover remnant from refactoring logic.

Secondly, in the AC we have:

On the rule edit page, there is a tooltip explaining what this field does, (and a link to documentation?).

Have we decided to not add this tooltip?

I didn't see that in the AC before, but I would say the description we have underneath the field in the form would suffice for this component. It's pretty straightforward and will have been documented externally - I'm struggling to think of what more we would even put in there. @approksiu might have a different opinion

dplumlee avatar Apr 26 '24 21:04 dplumlee

@ARWNightingale @vgomez-el Any update on acceptance/exploratory testing for this component? Hoping to merge before the next serverless branch on monday.

dplumlee avatar May 02 '24 18:05 dplumlee

@ARWNightingale @vgomez-el cc @dplumlee

Here are the credentials so that you can access ESS and Serverless environments for testing:

ESS: Env Url: https://kibana-pr-179680.kb.us-west2.gcp.elastic-cloud.com:9243/ User and password: https://p.elstc.co/paste/s7aUuzef#3dFyPMsGXJ50ecNnK5kDBUkiy3CGEOV8JQ+UJN6P5Dx

Serverless: Env URL: https://kibana-pr-179680-security-af6bdf.kb.eu-west-1.aws.qa.elastic.cloud/login User and password: https://p.elstc.co/paste/w7k0D-wH#QvDSl6IaCj7MLlcyW6oy-lYFU7XwlOYH+p6njH9SRqr

Remember to select the option Login with Elasticsearch to input the user and passwords retrieved from the links above.

jpdjere avatar May 03 '24 08:05 jpdjere

Looks good to me, That environment has just gone down on me but one small issue is when I put 0 into the input I get an error saying " Max alerts must be greater than 0". I can still save as it just reverts it too the previous value. I think we should add that to the warning. " Max alerts must be greater than 0. The value will be set to default when saved, unless changed" or something similar.

Apart from that, nice work!

ARWNightingale avatar May 03 '24 08:05 ARWNightingale

Agree with @ARWNightingale comment above, but I think the validation should fail on the API request, same as when the the value is negative. Left a comment in the PR on how to fix that.

jpdjere avatar May 03 '24 10:05 jpdjere

Feature LGTM, but I also have a couple of comments:

  • In my opinion, although warning messages are displayed in several places, it is a bit strange to let the user save large numbers in the max_alerts field, but then only generate the maximum alerts set in [xpack.alerting.rules.run.alerts.max.] I am referring more to allowing large values to be saved, than to how or how many alerts are generated:(https://www.elastic.co/guide/en/kibana/8.11/alert-action-settings-kb.html#alert-settings)
image
  • Field is catching nice rare values, But the error message is unrefined and shows that [request body] String in the message that looks a bit ugly ( And I was editing a rule, not adding a new one)
image image

vgomez-el avatar May 03 '24 12:05 vgomez-el

The feature has been live in Serverless for a while.

We also had a discussion on what should we do with 18 prebuilt rules that have max_signals equal to 10000 (context). Summary of the discussion:

  • TRaDE needs this 10k setting in those rules because those rules are "promotion" rules, and should be able to generate as many detection alerts from external alerts as possible. Users will be able to achieve that via tweaking the xpack.alerting.rules.run.alerts.max system setting. Updating max_signals to a lower value is not an option.
  • We agreed that it would be worth documenting that those rules that have max_signals > xpack.alerting.rules.run.alerts.max will be generating at most xpack.alerting.rules.run.alerts.max alerts per execution (1000 by default).
    • @joepeeples documented this in the public docs (PR).
    • @Mikaayenson updated setup guides of those 18 rules with that information (PR).
  • We agreed that @approksiu will collect some telemetry on rules hitting the max_signals circuit breaker.
  • We also discussed, with no particular action items:
    • That we could show a dismissable "FYI" callout on the Rule Details page when max_signals of the rule is higher than the system setting.
    • That severity of the Warning status might be too high. @approksiu disagreed with that.
    • That we should give a heads up to the user before they enable or maybe even install a prebuilt rule regarding what needs to be done to make this rule work (setup guide, caveats). This would be a separate UX epic to work on, and @approksiu had some thoughts on that.
    • That we can ask ResponseOps if we could force max_signals and override the system setting with it, for certain rules.
    • That users can override the xpack.alerting.rules.run.alerts.max setting in Cloud ESS.

@dplumlee Let's keep the behavior of the Warning status as is. My concerns did not get support from product, but I'm totally fine with that. I don't think we have any other immediate action items related to this feature (we can always revisit later), so I'll close the ticket. cc @jpdjere

Thanks to everyone involved! We'll track https://github.com/elastic/security-docs/issues/5029 separately.

banderror avatar May 22 '24 10:05 banderror

Here's the evidence about rules hitting max_signals and not saving all alerts, based on cloud telemetry. ( I am looking to find the data for on-prem for this issues.) In the past 30 days, we had about 10-12 clusters on average reporting errors when hitting max signals, of which about 5-9 clusters reporting this issues for prebuilt promotion rules. In this data we track > 3200 clusters that have reported data with rule mentioned. The most common rule is External Alerts.

approksiu avatar Aug 22 '24 14:08 approksiu

After more research here're the latest numbers: doc.

approksiu avatar Aug 27 '24 09:08 approksiu