kibana return error message when usage collector failed

Fixes: #135938

This PR intends to add two new fields (error_messages and has_errors) to alerting and actions telemetry objects, so telemetry aggregation failure would be explicitly shown in telemetry data.

Jul 18 '22 09:07 ersin-erdal

Pinging @elastic/response-ops (Team:ResponseOps)

Jul 18 '22 15:07 elasticmachine

Would it make more sense to keep a running count of the number of times the telemetry has failed for alerting/actions vs returning an array of error messages? We definitely don't want to map the error messages field, but then that makes it difficult to search for in the telemetry data. Or, if we do want the error messages, maybe we should also keep a running count so that in the telemetry data, we can at least filter for num_errors > 0 to filter the telemetry data down to just those containing errors?

Jul 20 '22 13:07 ymao1

Would it make more sense to keep a running count of the number of times the telemetry has failed for alerting/actions vs returning an array of error messages? We definitely don't want to map the error messages field, but then that makes it difficult to search for in the telemetry data. Or, if we do want the error messages, maybe we should also keep a running count so that in the telemetry data, we can at least filter for num_errors > 0 to filter the telemetry data down to just those containing errors?

In the issue a success:boolean field is also recommended to make it easy to search, but after a discussion we thought that it might be a bit misleading. For example, in actions usage we collect data from 3 different sources, then we merge them. what should we show if only one of them fails? success true or false? We can add it back, or add a num_errors field as you said but might be a bit tricky, what if getLatestTaskState fails? we can't know the number of previous failures... Adding success field back sounds better to me :) WDYT?

Jul 20 '22 13:07 ersin-erdal

We can add it back, or add a num_errors field as you said but might be a bit tricky, what if getLatestTaskState fails? we can't know the number of previous failures...

Gotcha. That makes sense. What about instead of a success field, we name it hasErrors? Then we can set it to true if any of the sources throws an error?

Jul 20 '22 14:07 ymao1

We can add it back, or add a num_errors field as you said but might be a bit tricky, what if getLatestTaskState fails? we can't know the number of previous failures...

Gotcha. That makes sense. What about instead of a success field, we name it hasErrors? Then we can set it to true if any of the sources throws an error?

hasError sounds very good :) adding it.

edit: done :)

Jul 20 '22 14:07 ersin-erdal

:green_heart: Build Succeeded

Buildkite Build
Commit: 6235c053193df3ac4ee46fa6342549fe1dc1f2bf

Metrics [docs]

✅ unchanged

History

:yellow_heart: Build #63547 was flaky c29fcb778c1cb935bd1ad8e4f67a842d9c524dec
:yellow_heart: Build #63234 was flaky 8004308709074e39d48a362f16b628cac00893f7
:yellow_heart: Build #60891 was flaky f976ed3cc32f99c06fcf145e7d949c1615d95576
:broken_heart: Build #60838 failed 5e4a6c7f7afb09b6edb61500e6c29dcb6c9d60ff
:green_heart: Build #60769 succeeded 2954c9360c9bddd58ead37c2499f30462a239057
:yellow_heart: Build #60321 was flaky 4e93b916e53c32991ca10b4fb0b035cbceba318c

To update your PR or re-run it, just comment with: @elasticmachine merge upstream

Aug 10 '22 11:08 kibana-ci

kibana kibana copied to clipboard

return error message when usage collector failed

:green_heart: Build Succeeded

Metrics [docs]

History

kibana
kibana copied to clipboard