icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

Multiple Notifications: State changed detected from DOWN to DOWN

Open danielmoser96 opened this issue 7 months ago • 2 comments

Describe the bug

We have noticed that notifications are sent multiple times despite the definition of interval=0. This is due to the fact that Icinga sporadically recognizes event changes from DOWN to DOWN or CRITICAL to CRITICAL.

debug log:

[2024-07-11 11:10:10 +0200] notice/NotificationComponent: Reminder notification 'XXX!Prio_123_Mail_Host': Notification was sent out once and interval=0 disables reminder notifications.
[2024-07-11 11:10:42 +0200] debug/Checkable: Update checkable 'XXX' with check interval '30' from last check time at 2024-07-11 11:10:42 +0200 (1.72069e+09) to next check time at 2024-07-11 11:10:47 +0200 (1.72069e+09).
[2024-07-11 11:10:42 +0200] notice/Checkable: State Change: Checkable 'XXX' soft state change from DOWN to DOWN detected.
[2024-07-11 11:11:21 +0200] debug/Checkable: Update checkable 'XXX' with check interval '30' from last check time at 2024-07-11 11:11:21 +0200 (1.72069e+09) to next check time at 2024-07-11 11:11:25 +0200 (1.72069e+09).
[2024-07-11 11:11:21 +0200] notice/Checkable: State Change: Checkable 'XXX' soft state change from DOWN to DOWN detected.
[2024-07-11 11:11:59 +0200] debug/Checkable: Update checkable 'XXX' with check interval '30' from last check time at 2024-07-11 11:11:59 +0200 (1.72069e+09) to next check time at 2024-07-11 11:12:03 +0200 (1.72069e+09).
[2024-07-11 11:11:59 +0200] notice/Checkable: State Change: Checkable 'XXX' soft state change from DOWN to DOWN detected.
[2024-07-11 11:12:37 +0200] debug/Checkable: Update checkable 'XXX' with check interval '30' from last check time at 2024-07-11 11:12:37 +0200 (1.72069e+09) to next check time at 2024-07-11 11:12:42 +0200 (1.72069e+09).
[2024-07-11 11:12:37 +0200] notice/Checkable: State Change: Checkable 'XXX' soft state change from DOWN to DOWN detected.
[2024-07-11 11:13:16 +0200] debug/Checkable: Update checkable 'XXX' with check interval '30' from last check time at 2024-07-11 11:13:15 +0200 (1.72069e+09) to next check time at 2024-07-11 11:13:45 +0200 (1.72069e+09).
[2024-07-11 11:13:16 +0200] notice/Checkable: State Change: Checkable 'XXX' hard state change from DOWN to DOWN detected.
[2024-07-11 11:13:16 +0200] notice/NotificationComponent: Attempting to send reminder notification 'XXX!Prio_123_Mail_Host'.
[2024-07-11 11:13:16 +0200] notice/Notification: Attempting to send reminder notifications of type 'Problem' for notification object 'XXX!Prio_123_Mail_Host'.
[2024-07-11 11:13:16 +0200] debug/Notification: User 'Mail' notification 'XXX!Prio_123_Mail_Host', Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=32, TypeFilter=32)
[2024-07-11 11:13:16 +0200] debug/Notification: User 'Mail' notification 'XXX!Prio_123_Mail_Host', State 'Down', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=32, StateFilter=-1)
[2024-07-11 11:13:16 +0200] information/Notification: Sending reminder 'Problem' notification 'XXX!Prio_123_Mail_Host' for user 'Mail'
[2024-07-11 11:13:16 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/xxx/action_notify_by_mail.py' ... : PID 460440
[2024-07-11 11:13:16 +0200] information/Notification: Completed sending 'Problem' notification 'XXX!Prio_123_Mail_Host' for checkable 'XXX' and user 'Mail' using command 'action_notify_by_mail_host'.
[2024-07-11 11:13:16 +0200] notice/Process: PID 460440 ('/usr/lib64/nagios/plugins/xxx/action_notify_by_mail.py' ... ) terminated with exit code 0

To Reproduce

  1. Create a Host with an IP which is not available.
  2. Enable this Notification Rule:
zones.d/master/notification_apply.conf
apply Notification "Prio_123_Mail_Host" to Host {
    command = "[action_notify_by_mail_host]"
    interval = 0s
    period = "24x7"
    assign where host.name
    states = [ Down ]
    types = [ Problem ]
    users = [ "Mail" ]
}
  1. Wait some time

Expected behavior

A host state change from DOWN to DOWN should not occur.

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version):

r2.14.2-1

  • Operating System and version:

Redhat 8.10

  • Enabled features (icinga2 feature list):

api checker command graphite icingadb livestatus mainlog notification

  • Icinga Web 2 version and modules (System - About):

2.12.1

  • Config validation (icinga2 daemon -C):

[2024-07-11 11:26:03 +0200] information/cli: Icinga application loader (version: r2.14.2-1) [2024-07-11 11:26:03 +0200] information/cli: Loading configuration file(s). [2024-07-11 11:26:03 +0200] information/ConfigItem: Committing config item(s). [2024-07-11 11:26:03 +0200] warning/Zone: The Zone object 'master' has more than two endpoints. Due to a known issue this type of configuration is strongly discouraged and may cause Icinga to use excessive amounts of CPU time. [2024-07-11 11:26:03 +0200] information/ApiListener: My API identity: srv123.xxx.com [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 NotificationComponent. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 LivestatusListener. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 GraphiteWriter. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1937 Downtimes. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 ExternalCommandListener. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 354 Dependencies. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 CheckerComponent. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 8 Users. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 12 TimePeriods. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 ServiceGroup. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 6609 Services. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1466 ScheduledDowntimes. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 4 Zones. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 11 NotificationCommands. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 19039 Notifications. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 FileLogger. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 IcingaApplication. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 2359 Hosts. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 81 HostGroups. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 3 Endpoints. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 38 Comments. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 13 ApiUsers. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 ApiListener. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 337 CheckCommands. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 IcingaDB. [2024-07-11 11:26:03 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' [2024-07-11 11:26:03 +0200] information/cli: Finished validating the configuration file(s).

  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.

object Endpoint "srv100.xxx.com" { host = "srv100.xxx.com" port = "5665" log_duration = 10m } object Endpoint "srv101.xxx.com" { host = "srv101.xxx.com" port = "5665" log_duration = 10m } object Endpoint "srv102.xxx.com" {} object Zone "master" { endpoints = ["srv100.xxx.com","srv101.xxx.com","srv102.xxx.com"] } object Zone "global-templates" { global = true } object Zone "director-global" { global = true }

danielmoser96 avatar Jul 11 '24 09:07 danielmoser96