centreon-plugins icon indicating copy to clipboard operation
centreon-plugins copied to clipboard

cloud::aws::cloudwatch::plugin --mode=get-alarms: plugin not detecting active alarms

Open raploureiro opened this issue 3 months ago • 3 comments

Hello everyone, I'm reporting this bug and hope I can contribute.

Bug report

Quick description

*We have an active alarm, which can be confirmed when using the --debug option. The output indicates that the instance memory has <StateValue>ALARM</StateValue>. However, the final result is: OK: 0 problem(s) detected | 'alerts'=0;;;0; *

How to reproduce

Please provide the initial conditions to reproduce the bug down below

  • Environment: AlmaLinux release 9.6 (Sage Margay)
  • Version of the plugin: centreon-plugin-Cloud-Aws-Cloudwatch-Api-20250800-1.el9.noarch
  • Information about the monitored resource: AWS Cloudwatch aws-cli/2.28.22 Python/3.13.7 Linux/5.14.0-570.24.1.el9_6.x86_64 exe/x86_64.almalinux.9
  • Command line: /usr/lib/centreon/plugins/centreon_aws_cloudwatch_api.pl --plugin=cloud::aws::cloudwatch::plugin --mode=get-alarms --custommode='awscli' --aws-secret-key='' --aws-access-key='' --aws-role-arn='' --proxyurl='' --region='sa-east-1' --filter-alarm-name='' --warning-status='%{state_value} =~ /INSUFFICIENT_DATA/i' --critical-status='%{state_value} =~ /ALARM/i' --verbose.

Expected result

CRITICAL: 1 problem(s) detected

Actual result

* <TreatMissingData>breaching</TreatMissingData> <AlarmConfigurationUpdatedTimestamp>2025-09-01T18:58:33.193Z</AlarmConfigurationUpdatedTimestamp> <StateValue>ALARM</StateValue> <Threshold>40.0</Threshold> <StateReason>Threshold Crossed: no datapoints were received for 1 period and 1 missing datapoint was treated as [Breaching].</StateReason> <InsufficientDataActions/> <StateTransitionedTimestamp>2025-09-02T04:39:15.668Z</StateTransitionedTimestamp> <AlarmActions/> <StateUpdatedTimestamp>2025-09-02T04:39:15.668Z</StateUpdatedTimestamp> <ComparisonOperator>GreaterThanThreshold</ComparisonOperator> <AlarmName>Instance memory</AlarmName> <EvaluationPeriods>1</EvaluationPeriods> <StateReasonData>{"version":"1.0","queryDate":"2025-09-02T04:39:15.664+0000","period":300,"recentDatapoints":[],"threshold":40.0,"evaluatedDatapoints":[{"timestamp":"2025-09-02T04:34:00.000+0000"}]}</StateReasonData> <ActionsEnabled>true</ActionsEnabled> <DatapointsToAlarm>1</DatapointsToAlarm> <Metrics> <Expression>MAX(e1)</Expression> <ReturnData>true</ReturnData> <Label>Instance memory usage</Label> <Id>e2</Id> <Period>300</Period> <Expression>SELECT AVG(mem_used_percent) FROM SCHEMA(CWAgent,InstanceId) GROUP BY InstanceId ORDER BY AVG() DESC</Expression> <ReturnData>false</ReturnData> <Label>Instance</Label> <Id>e1</Id> </Metrics> <CreationId>87b60bf6-d984-4612-8e58-c77ed25e9809/1751060096124</CreationId> <OKActions/> <AlarmArn>arn:aws:cloudwatch:sa-east-1:074071149174:alarm:Instance memory</AlarmArn> <Dimensions/> </MetricAlarms> </DescribeAlarmsResult> <ResponseMetadata> <RequestId>af6003e5-476b-400c-a06f-77ce382f0c99</RequestId> </ResponseMetadata> </DescribeAlarmsResponse>

2025-09-02 09:44:31,070 - MainThread - botocore.hooks - DEBUG - Event needs-retry.cloudwatch.DescribeAlarms: calling handler <botocore.retryhandler.RetryHandler object at 0x7f8741f3e650> 2025-09-02 09:44:31,071 - MainThread - botocore.retryhandler - DEBUG - No retry needed. OK: 0 problem(s) detected | 'alerts'=0;;;0; Command line: 'aws cloudwatch describe-alarms --region sa-east-1 --output json --debug' ok: status : skipped (no value(s)) ok: status : skipped (no value(s)).*

raploureiro avatar Sep 03 '25 15:09 raploureiro

Hi, To help us understand the problem could you run the aws cloudwatch describe-alarms --region sa-east-1 --output json --debug command the next time the problem occurs and post the result ( please make sure to anonymize any sensitive data) ? Thanks

scresto31 avatar Sep 19 '25 09:09 scresto31

outputAws.txt

Hello,

I am attaching the outputAws file with the command aws cloudwatch describe-alarms --region sa-east-1 --output json --debug

Thanks.

Em sex., 19 de set. de 2025 às 06:42, Sylvain Cresto < @.***> escreveu:

scresto31 left a comment (centreon/centreon-plugins#5731) https://github.com/centreon/centreon-plugins/issues/5731#issuecomment-3311480399

Hi, To help us understand the problem could you run the aws cloudwatch describe-alarms --region sa-east-1 --output json --debug command the next time the problem occurs and post the result ( please make sure to anonymize any sensitive data) ? Thanks

— Reply to this email directly, view it on GitHub https://github.com/centreon/centreon-plugins/issues/5731#issuecomment-3311480399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AW5SXM2M3GZO5AINH5XVDFL3TPFZFAVCNFSM6AAAAACFRCHIHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMJRGQ4DAMZZHE . You are receiving this because you authored the thread.Message ID: @.***>

raploureiro avatar Sep 19 '25 14:09 raploureiro

Hi, Thank you for this information, we have identified the issue and will fix it.

scresto31 avatar Sep 26 '25 09:09 scresto31