kibana icon indicating copy to clipboard operation
kibana copied to clipboard

[Metrics UI] Improve API error handling for ES queries

Open neptunian opened this issue 3 years ago • 6 comments

Currently if an API request fails this message is returned:

Screen Shot 2021-03-31 at 1 14 57 PM

The above error occurred from /api/metrics/snapshot after a 503 ES request failed due to exceeding max_buckets.

This same error looks like this within Stack Monitoring which is the error returned by ES: Screen Shot 2021-03-31 at 1 08 59 PM

And like this in APM: Screen Shot 2021-04-02 at 1 25 07 PM

Should we improve our errors when an ES query fails so the user has more information? Using the above case as an example, the user would be able to take some meaningful action to resolve the issue on their own.

neptunian avatar Apr 01 '21 19:04 neptunian

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

elasticmachine avatar Apr 01 '21 19:04 elasticmachine

Hey @neptunian - any idea how often this might happen/what might cause it?

Trying to get a sense of how much value fixing this give (e.g. are 1% of users hitting this or is it happening every day for most users) and how simple a solution might be for it (i.e. can we provide a simple error message which explains what the user should do)?

roshan-elastic avatar Jul 10 '23 09:07 roshan-elastic

I don't know how often it might happen. In this example, the user could set their max_buckets setting to something other than the default which could cause max_bucket errors for queries like we have in our UIs. In the SM case if the user saw the max_bucket error they could connect the dots, but in inventory they might not understand because no details are given. We could probably do a quick improvement where we show pass the ES error down to the toast and make it red instead of yellow. Looks like there are some linked issues wanting to address this holistically.

neptunian avatar Jul 10 '23 19:07 neptunian

Thanks for the context @neptunian - I tried to see if there is any telemetry for these messages and there is doesn't appear to be any explicit telemetry implemented for the error toasts.

I'll prioritise this low but I think it's a good idea to have error messaging to have telemetry deployed as standard so we can understand impact.

Question : Do you know the best way to implement telemetry?

e.g. Should we speak to the team who manage error handling (e.g. someone in platform) and ask them to deploy something we can use?

roshan-elastic avatar Jul 24 '23 09:07 roshan-elastic

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

elasticmachine avatar Nov 14 '23 00:11 elasticmachine

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

botelastic[bot] avatar May 12 '24 00:05 botelastic[bot]

We're not planning this at this time.

smith avatar May 18 '24 03:05 smith