[Synthetics] Improve synthetics alerting
Summary
Fixes https://github.com/elastic/kibana/issues/175298
Improve synthetics alerting !!
User will be able to create custom synthetics status alert by defining three kind of criteria
Monitor is down over last consective checks with threshold
From Locations threshold
Will be considered down only when from defined number of locations
Over time with checks threshold just like uptime custom status alert
/ci
:robot: GitHub comments
Expand to view the GitHub comments
Just comment with:
/oblt-deploy: Deploy a Kibana instance using the Observability test environments.rundocs-build: Re-trigger the docs validation. (use unformatted text in the comment!)
/ci
/ci
/ci
/ci
/ci
/ci
:broken_heart: Build Failed
- Buildkite Build
- Commit: d180f77ca387cb000cbfb826f253cf1bebb39d12
Failed CI Steps
- FTR Configs #28
- FTR Configs #28
- Check Types
- Synthetics @elastic/synthetics Tests
- Synthetics @elastic/synthetics Tests
Test Failures
- [job] [logs] FTR Configs #28 / Synthetics API Tests GetMonitorsOverview accepts search queries
- [job] [logs] FTR Configs #28 / Synthetics API Tests GetMonitorsOverview accepts search queries
Metrics [docs]
Module Count
Fewer modules leads to a faster build time
| id | before | after | diff |
|---|---|---|---|
synthetics |
914 | 1079 | +165 |
Public APIs missing comments
Total count of every public API that lacks a comment. Target amount is 0. Run
node scripts/build_api_docs --plugin [yourplugin] --stats commentsfor more detailed information.
| id | before | after | diff |
|---|---|---|---|
@kbn/observability-synthetics-test-data |
- | 4 | +4 |
Async chunks
Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app
| id | before | after | diff |
|---|---|---|---|
synthetics |
854.8KB | 1.1MB | :warning: +247.7KB |
triggersActionsUi |
1.7MB | 1.7MB | +119.0B |
| total | +247.8KB |
Public APIs missing exports
Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0.
Run node scripts/build_api_docs --plugin [yourplugin] --stats exportsfor more detailed information.
| id | before | after | diff |
|---|---|---|---|
@kbn/observability-synthetics-test-data |
- | 1 | +1 |
Page load bundle
Size of the bundles that are downloaded on every page load. Target size is below 100kb
| id | before | after | diff |
|---|---|---|---|
synthetics |
19.4KB | 21.7KB | +2.3KB |
Unknown metric groups
API count
| id | before | after | diff |
|---|---|---|---|
@kbn/observability-synthetics-test-data |
- | 4 | +4 |
async chunk count
| id | before | after | diff |
|---|---|---|---|
synthetics |
8 | 9 | +1 |
ESLint disabled line counts
| id | before | after | diff |
|---|---|---|---|
synthetics |
84 | 83 | -1 |
Total ESLint disabled count
| id | before | after | diff |
|---|---|---|---|
synthetics |
91 | 90 | -1 |
History
- :broken_heart: Build #222448 failed 424bc3d82b8c20260c804d068a3ccd7c4aa27604
- :broken_heart: Build #222419 failed 7c19f87c96ad1e29ea6ff47f91b814e2d911bfee
Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)
@elasticmachine merge upstream
/oblt-deploy
When selecting a monitor from the monitor filter, the monitor id appears in the selection box, rather than the name
/oblt-deploy
/oblt-deploy
Is this expected?
https://github.com/user-attachments/assets/d4156882-0117-4e8e-8d08-b72c0992683b
And does this info mean that users will receive one alert per monitor? I wonder if we can make it clearer to the user what exactly this means. cc @maciejforcone @paulb-elastic
@maryam-saeidi, yes, the behaviour is that if you specify from at least 1 location (i.e. only 1) then you can select the toggle to get alerts from each location independently (this is how it works today as each monitor+location pair is considered independently).
When defining >1 you cannot select this option because all locations are now being considered and there needs to be a failure from more than one for the alert to be sent (so you can't opt to have them separately per location).
To your point though, this is not very clear and UX/wording could be improved.
I don't know if we do this elsewhere (@maciejforcone?), maybe when selecting >1, instead of the toggle disappearing, it is disabled (greyed) and a description added next to it?
I'm not convinced this wording is clear but along the lines of (you can't select to send alerts per location if the condition is checking FROM AT LEAST multiple locations) (@florent-leborgne perhaps you could suggest something better here?)
@paulb-elastic happy to help but I'm not sure to understand correctly what this does. When the toggle is available, what is the difference when it's off vs. when it's on?
Thanks, @paulb-elastic, for clarification.
Regarding the info message:
Monitor alerts are always grouped by monitor ID.
Can we maybe clarify it to: "Separate alerts will be sent per monitor" or something like that, if that is what we mean?
@maryam-saeidi
Can we maybe clarify it to: "Separate alerts will be sent per monitor"
Yes - good shout (sorry, I hadn't picked up on that from your earlier screen shot) I think your description is much better
@florent-leborgne
happy to help but I'm not sure to understand correctly what this does. When the toggle is available, what is the difference when it's off vs. when it's on?
Yeah - very fair point - as I wrote that example description I wasn't happy with how to shrink this down to a single / short line, so not surprised it doesn't make sense.
I'll try to elaborate here and not be limited by short space.
Firstly, some background... monitors are run independently from each of the locations they are configured to run from. So, you can have My Monitor monitoring My Website from LocationA, LocationB and LocationC.
At the moment, the monitor+location is completely independent and will alert/recover independently. For example if My Website goes down completely then each independent test from LocationA, LocationB and LocationC will begin to fail and I will get 3 alerts for My Monitor from LocationA, LocationB and LocationC, separately.
Let’s imagine that My Website has not gone down, but there is some internet connectivity problem in the region of LocationA. In that scenario, the monitor running in LocationA will go DOWN, but those in LocationB and LocationC are UP. At the moment (current behaviour in Synthetics), you would get an alert from LocationA only.
This is a fair and reasonable scenario that many users will want from Synthetics. However, there are other users who only want to know about the problem if multiple locations report the problem. This could be because for example that they know there are sometimes transient issues from certain regions that they cannot do much to resolve (so such an alert is just noise).
In those cases, users may want the behaviour ”only alert me when at least 2 locations report that there is a problem”. In our example, this could be from any of LocationA, LocationB or LocationC, but importantly, only generate an alert if >1 of the locations report a problem.
This new rule allows users to be able to configure that.
| Send alert per location | From at least | Behaviour |
|---|---|---|
| Off | = 1 | As soon as there is a failure from any location, send an alert. You will only get 1 alert for that monitor, regardless of how many locations report the failure. |
| On | = 1 | As soon as a location detects a failure, send an alert. You will receive separate alerts from any and all locations that identify the failure. |
| n/a | > 1 (n) | When at least n locations identify the failure, send an alert. You will only get 1 alert for that monitor, regardless of how many locations report the failure (as long as it’s from more than n locations). |
(I appreciate that it’s not the location that actually sends the alert, but the rule that runs in Kibana and looks for those scenarios, but have written it like that as I think it’s clearer to understand the behaviour)
I am not sure how to boil that down into a simple description line of text next to the toggle 😬
Thanks for the details @paulb-elastic!
I would suggest one of the following solutions (hopefully I understood it correctly now 😅):
- Renaming the toggle to "Receive distinct alerts for each location". We could even append a tooltip next to it if we feel it's necessary. To say something like "When the monitor detects a failure on one or more locations, you receive an alert for each of these locations, instead of a single alert."
- Use 2 radio buttons instead of a toggle, that way we can explain the behaviour for each state (for example the one selected by default could be "Receive a unique alert" with a tooltip to keep the label short and still explain more "When the monitor detects a failure on one or more locations, you get only one alert.", and the second option would be "Receive distinct alerts for each location")
Would that solve the issue?
Thanks @florent-leborgne they're great suggestions!!
I think the toggle (first suggestion) might be slightly better in that it fits in more with the other configuration controls we have in general (at least, from what I can tell), although there's not much in it if others think the radio buttons would be better.
I like your wording - it's a lot clearer.
I was thinking about how it "disappears" too, and where I suggested above we disable it (grey it out). I think if the user selects >1, we should also switch the toggle off (as well as it being disabled so the user cannot switch it on) as that also describes the behaviour (i.e. you WON'T receive distinct alerts for each location).
Updated copies
Disabled state
Flaky Test Runner Stats
🟠 Some tests failed. - kibana-flaky-test-suite-runner#7012
[❌] x-pack/test/alerting_api_integration/observability/config.ts: 25/50 tests passed.
Flaky Test Runner Stats
🎉 All tests passed! - kibana-flaky-test-suite-runner#7017
[✅] x-pack/test/alerting_api_integration/observability/config.ts: 50/50 tests passed.
Flaky Test Runner Stats
🎉 All tests passed! - kibana-flaky-test-suite-runner#7018
[✅] x-pack/test/alerting_api_integration/observability/config.ts: 50/50 tests passed.
:green_heart: Build Succeeded
- Buildkite Build
- Commit: c9fa0da1c8f181f9378dafecd8fc9fa2c2e44dc3
- Kibana Serverless Image:
docker.elastic.co/kibana-ci/kibana-serverless:pr-186585-c9fa0da1c8f1
Metrics [docs]
Module Count
Fewer modules leads to a faster build time
| id | before | after | diff |
|---|---|---|---|
synthetics |
1019 | 1154 | +135 |
Public APIs missing comments
Total count of every public API that lacks a comment. Target amount is 0. Run
node scripts/build_api_docs --plugin [yourplugin] --stats commentsfor more detailed information.
| id | before | after | diff |
|---|---|---|---|
@kbn/observability-synthetics-test-data |
- | 4 | +4 |
Async chunks
Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app
| id | before | after | diff |
|---|---|---|---|
synthetics |
964.5KB | 1.2MB | :warning: +237.5KB |
triggersActionsUi |
1.6MB | 1.6MB | +115.0B |
| total | +237.6KB |
Public APIs missing exports
Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0.
Run node scripts/build_api_docs --plugin [yourplugin] --stats exportsfor more detailed information.
| id | before | after | diff |
|---|---|---|---|
@kbn/observability-synthetics-test-data |
- | 1 | +1 |
Page load bundle
Size of the bundles that are downloaded on every page load. Target size is below 100kb
| id | before | after | diff |
|---|---|---|---|
synthetics |
36.9KB | 37.1KB | +221.0B |
Unknown metric groups
API count
| id | before | after | diff |
|---|---|---|---|
@kbn/observability-synthetics-test-data |
- | 4 | +4 |
ESLint disabled line counts
| id | before | after | diff |
|---|---|---|---|
@kbn/synthetics-e2e |
19 | 18 | -1 |
@kbn/test-suites-xpack |
721 | 722 | +1 |
synthetics |
49 | 52 | +3 |
| total | +3 |
Total ESLint disabled count
| id | before | after | diff |
|---|---|---|---|
@kbn/synthetics-e2e |
20 | 19 | -1 |
@kbn/test-suites-xpack |
745 | 746 | +1 |
synthetics |
55 | 58 | +3 |
| total | +3 |
History
- :yellow_heart: Build #238291 was flaky b89059c5047ef0fd88682cec5b969498ba89415b
- :green_heart: Build #238129 succeeded 6fec0e2ded4339dd76d324f4f6816558b0618f2e
- :broken_heart: Build #238058 failed 27c2816fd4029168ee1559fd24164fb4c941f10b
- :broken_heart: Build #237898 failed 2b4f9715cba693eeae11658dee07baf5ffc63628
- :broken_heart: Build #237731 failed cf339ed630d4ff79b7ff84a601fc656192e421ba
To update your PR or re-run it, just comment with:
@elasticmachine merge upstream
Starting backport for target branches: 8.x
https://github.com/elastic/kibana/actions/runs/11129829761
💔 All backports failed
| Status | Branch | Result |
|---|---|---|
| ❌ | 8.x | Backport failed because of merge conflicts You might need to backport the following PRs to 8.x: - [Synthetics] Remove dead code (#193335) |
Manual backport
To create the backport manually run:
node scripts/backport --pr 186585
Questions ?
Please refer to the Backport tool documentation
💚 All backports created successfully
| Status | Branch | Result |
|---|---|---|
| ✅ | 8.x |
Note: Successful backport PRs will be merged automatically after passing CI.
Questions ?
Please refer to the Backport tool documentation