kibana icon indicating copy to clipboard operation
kibana copied to clipboard

[Metrics UI] No anomalies from any jobs being returned when one job is missing

Open neptunian opened this issue 3 years ago • 3 comments

If for some reason an ML job isn't created correctly or the user deletes the job, no anomalies will show up in the Anomaly Flyout. Errors are caught but not thrown, so the user does not know what's happened. I noticed this after seeing anomalies appear for k8s memory in the timeline but not in the Anomaly Flyout table. The fetch for k8s anomaly in the table failed with the error: MLJobNotFound [Error]: No known job with id 'kibana-metrics-ui-default-default-k8s_network_out' before it was able to get the kibana-metrics-ui-default-default-k8s_memory_usage, so no anomalies were reported even though kibana-metrics-ui-default-default-k8s_memory_usage job with anomalies existed.

To reproduce:

  • Make sure anomalies and ML jobs exist, click Kubernetes Pods from the dropdown and see anomalies in the Anomaly Flyout: Screen Shot 2021-04-15 at 10 02 44 AM

  • Click the "Jobs" tab in the flyout and click the Manage ML jobs button from the Flyout and delete kibana-metrics-ui-default-default-k8s_network_out

  • Go back to the Anomalies Flyout and select Kubernetes Pods from the dropdown. No anomalies are shown from any k8s jobs. No error is thrown in the network or console tab because the error is caught but not thrown.

This was partially discovered after locally implementing the toast error in useTrackedPromise which displays on promise rejections. After that is merged we could have a useful error/warning show up if desired. Screen Shot 2021-04-14 at 10 05 37 AM

Task:

I'm not sure how to handle this scenario. If one job doesn't exist, we could return the error asking the user to recreate the jobs and show no jobs or skip that job and continue returning anomalies from jobs that do exist and warn or not warn the user of the jobs that are missing. Perhaps we should check for missing jobs as soon as they open the flyout and ask them to recreate them there where we have other messaging: Screen Shot 2021-04-15 at 11 02 00 AM

Note that though this example uses K8s, the issue most likely also exists with Host anomalies

neptunian avatar Apr 15 '21 15:04 neptunian

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

elasticmachine avatar Apr 15 '21 15:04 elasticmachine

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

elasticmachine avatar Nov 14 '23 00:11 elasticmachine

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

botelastic[bot] avatar May 12 '24 00:05 botelastic[bot]

Closing - we'll rework ML jobs once we are using EEM

roshan-elastic avatar Sep 02 '24 12:09 roshan-elastic