beats icon indicating copy to clipboard operation
beats copied to clipboard

[Filebeat] Add health status reporting to AWS CloudWatch input

Open taylor-swanson opened this issue 6 months ago • 1 comments

Update the AWS CloudWatch input to report its health status, providing better visibility into the operational state for users.

The AWS CloudWatch input should report its health status using the context.UpdateStatus() method. Status changes should only be reported when the status actually changes (don't repeatedly send HEALTHY if the status is already HEALTHY).

Status Reporting

The input should report the following states:

  • STARTING: When the input is initializing before attempting to connect to AWS.
  • CONFIGURING: When setting up AWS client and validating configuration.
  • HEALTHY: When successfully connected to AWS.
  • DEGRADED: When encountering non-fatal errors but can still operate. Examples:
    • Temporary network issues
    • Temporary permission issues that might resolve
  • FAILED: When encountering fatal errors preventing operation. Examples:
    • Authentication failures (invalid credentials)
    • Insufficient permissions to access resources
    • Persistent network connectivity issues

Include relevant context information with non-HEALTHY states:

  • For authentication issues: Include error message and status code (e.g., 401, 403
  • For connectivity issues: Include error message and endpoint being accessed
  • For resource not found: Include resource name and error message

References

See the CEL input implementation for reference: https://github.com/elastic/beats/blob/6409d005a31e0147a40f0872794880b3df15a69c/x-pack/filebeat/input/cel/input.go#L111

taylor-swanson avatar Jun 04 '25 13:06 taylor-swanson

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

elasticmachine avatar Jun 16 '25 14:06 elasticmachine

@taylor-swanson regarding the state change and validation of the chenge - will this create a log entry that's visible in discover? Or the state change only reflected in the agent deployment only ?

Kavindu-Dodan avatar Jul 30 '25 22:07 Kavindu-Dodan

@Kavindu-Dodan, there is a debug log that gets created when the status is updated. Otherwise, the status type and message is propagated up through agent's management protocol and will be displayed in the Agent details screen in Kibana.

taylor-swanson avatar Jul 31 '25 12:07 taylor-swanson

@taylor-swanson thank you. This helps with the development efforts. As an update, team is working on both CloudWatch & S3 inputs with the goal to complete within the current iteration.

Kavindu-Dodan avatar Jul 31 '25 13:07 Kavindu-Dodan

Sounds great, thank you!

This page also goes into a bit more detail about health status reporting: https://www.elastic.co/docs/reference/fleet/monitor-elastic-agent

taylor-swanson avatar Jul 31 '25 15:07 taylor-swanson

@taylor-swanson thanks for the resource. I have opened the PR for reviewing and hope you can have a look when you have time.

Kavindu-Dodan avatar Aug 01 '25 19:08 Kavindu-Dodan