[Filebeat] Add health status reporting to AWS CloudWatch input
Update the AWS CloudWatch input to report its health status, providing better visibility into the operational state for users.
The AWS CloudWatch input should report its health status using the context.UpdateStatus() method. Status changes should only be reported when the status actually changes (don't repeatedly send HEALTHY if the status is already HEALTHY).
Status Reporting
The input should report the following states:
- STARTING: When the input is initializing before attempting to connect to AWS.
- CONFIGURING: When setting up AWS client and validating configuration.
- HEALTHY: When successfully connected to AWS.
- DEGRADED: When encountering non-fatal errors but can still operate. Examples:
- Temporary network issues
- Temporary permission issues that might resolve
- FAILED: When encountering fatal errors preventing operation. Examples:
- Authentication failures (invalid credentials)
- Insufficient permissions to access resources
- Persistent network connectivity issues
Include relevant context information with non-HEALTHY states:
- For authentication issues: Include error message and status code (e.g., 401, 403
- For connectivity issues: Include error message and endpoint being accessed
- For resource not found: Include resource name and error message
References
See the CEL input implementation for reference: https://github.com/elastic/beats/blob/6409d005a31e0147a40f0872794880b3df15a69c/x-pack/filebeat/input/cel/input.go#L111
Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)
@taylor-swanson regarding the state change and validation of the chenge - will this create a log entry that's visible in discover? Or the state change only reflected in the agent deployment only ?
@Kavindu-Dodan, there is a debug log that gets created when the status is updated. Otherwise, the status type and message is propagated up through agent's management protocol and will be displayed in the Agent details screen in Kibana.
@taylor-swanson thank you. This helps with the development efforts. As an update, team is working on both CloudWatch & S3 inputs with the goal to complete within the current iteration.
Sounds great, thank you!
This page also goes into a bit more detail about health status reporting: https://www.elastic.co/docs/reference/fleet/monitor-elastic-agent
@taylor-swanson thanks for the resource. I have opened the PR for reviewing and hope you can have a look when you have time.