icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

TimePeriod: consult GetIncludes() and GetExcludes() in IsInside(), not UpdateRegion()

Open Al2Klimov opened this issue 4 years ago • 36 comments

~fixes~ doesn't fix #7398

Edit

The solution

2/3 commits just remove unused methods. 769de8d63cbc8af2e1f48878cfa1ab8a2683df14 is your friend.

Instead of (error prone) merging the timeperiod's own calculated segments (e.g. 1640000000 - 1670000000) with the ones of the included/excluded timeperiods (e.g. 1660000000 - 1690000000) for using the merge result in one IsInside() call, this PR keeps each timeperiod's own calculated segments isolated from each other.

Due to this IsInside() has to call included/excluded timeperiods' IsInside() now.

~The problem~

~IMAO Yonas already summarised the problem clearly in https://github.com/Icinga/icinga2/pull/7855#issuecomment-1162817801 .~

Edit II

"error prone" means that the order of periodically updated timeperiods matter. I.e. ideally the timeperiods used for including/excluding are updated first, then the ones refering to them. In the opposite case you have a problem as the computed ranges of parent_TP are merged with the computed ranges of child_TP before the latter are updated.

Al2Klimov avatar Feb 26 '20 14:02 Al2Klimov

@cla-bot check

Al2Klimov avatar Aug 04 '21 12:08 Al2Klimov

And maybe it also fixes #8741.

Al2Klimov avatar Apr 04 '22 11:04 Al2Klimov

I'm very interested for solving #8741. Thatswhy I try to bring focus back on this issue. I have some news which maybe would help for further debug:

This bug affected me already since icinga2-2.12.3-1.el7. With icinga2-2.13.2-1.el7 the bug was gone. On 2. May I updated to icinga2-2.13.3-1.el7 and the bug is back. Yesterday was the 1. May (holiday). Notifications arrived at 2. May in the morning with icinga2-2.13.2-1.el7, right after update to icinga2-2.13.3-1.el7, the notifications are missing again, as it treats 2. May as a holiday again.

davixd avatar May 03 '22 15:05 davixd

Test cases for https://github.com/Icinga/icinga2/issues/7398

Config
object TimePeriod "24x7" {
  ranges = { wednesday = "00:00-23:59" }
}

object TimePeriod "offDuty" {
  ranges = { wednesday = "10:00-17:00" }
}

object TimePeriod "preferred_tp" {
  ranges = { wednesday = "09:00-11:00" }
}

object TimePeriod "onDuty" {
  excludes = ["offDuty"]
  //includes = ["preferred_tp"]
  ranges = { wednesday = "00:00-23:59" }
}

object User "john" {
  email = "[email protected]"
  period = "24x7"
}

object Host "test" {
  address = "127.0.0.1"
  check_command = "hostalive"
}

apply Notification "notify-hosts" to Host {
  command = "mail-host-notification"
  period = "onDuty"

  users = ["john"]

  states = [Up, Down]
  types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]

  assign where true
}

Before

[2022-06-22 09:57:51 +0200] information/Notification: Completed sending 'DowntimeStart' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.
[2022-06-22 09:57:51 +0200] notice/Notification: Attempting to send notifications of type 'DowntimeStart' for notification object 'test!notify-hosts'.
[2022-06-22 09:57:51 +0200] debug/Notification: Type 'DowntimeStart', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=1, TypeFilter=511)
[2022-06-22 09:57:51 +0200] debug/Notification: User 'john' notification 'test!notify-hosts', Type 'DowntimeStart', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=1, TypeFilter=511)
[2022-06-22 09:57:51 +0200] debug/Notification: User 'john' notification 'test!notify-hosts', State 'Down', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=32, StateFilter=-1)
[2022-06-22 09:57:51 +0200] information/Notification: Sending 'DowntimeStart' notification 'test!notify-hosts' for user 'john'
[2022-06-22 09:57:51 +0200] notice/ApiListener: Relaying 'event::NotificationSentToAllUsers' message
[2022-06-22 09:57:51 +0200] information/Downtime: Triggering downtime 'test!d8b19bc4-25ed-4a5b-8372-fae70be6473d' for checkable 'test'.
[2022-06-22 09:57:51 +0200] debug/ApiListener: Sent update for object 'test!d8b19bc4-25ed-4a5b-8372-fae70be6473d': {"config":"object Downtime \"d8b19bc4-25ed-4a5b-8372-fae70be6473d\" ignore_on_error {\n\tauthor = \"icingaadmin\"\n\tcomment = \"Cluster upgrade maintenance\"\n\tconfig_owner = \"\"\n\tduration = 0.000000\n\tend_time = 1655888271.000000\n\tentry_time = 1655884671.609968\n\tfixed = true\n\thost_name = \"test\"\n\tparent = \"\"\n\tscheduled_by = \"\"\n\tstart_time = 1655884671.000000\n\ttriggered_by = \"\"\n\tversion = 1655884671.610063\n\tzone = \"master\"\n}\n","modified_attributes":{},"name":"test!d8b19bc4-25ed-4a5b-8372-fae70be6473d","original_attributes":[],"type":"Downtime","version":1655884671.610063,"zone":"master"}
[2022-06-22 09:57:51 +0200] notice/ApiListener: Relaying 'config::UpdateObject' message
[2022-06-22 09:57:51 +0200] information/ConfigObjectUtility: Created and activated object 'test!d8b19bc4-25ed-4a5b-8372-fae70be6473d' of type 'Downtime'.
[2022-06-22 09:57:51 +0200] notice/Process: Running command '/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/scripts/mail-host-notification.sh' '-4' '127.0.0.1' '-6' '' '-b' 'icingaadmin' '-c' 'Cluster upgrade maintenance' '-d' '2022-06-22 09:57:51 +0200' '-l' 'test' '-n' 'test' '-o' 'execvpe(/Users/yhabteab/Workspace/icinga2/prefix/lib/nagios/plugins/check_ping) failed: No such file or directory' '-r' '[email protected]' '-s' 'DOWN' '-t' 'DOWNTIMESTART': PID 48059
[2022-06-22 09:57:51 +0200] notice/WorkQueue: Stopped WorkQueue threads for 'ConfigObjectUtility::CreateObject'
[2022-06-22 09:57:51 +0200] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2022-06-22 09:57:51 +0200] information/Notification: Completed sending 'DowntimeStart' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.

After

[2022-06-22 10:33:58 +0200] notice/Notification: Not sending notifications for notification object 'test!notify-hosts': not in timeperiod 'onDuty'
[2022-06-22 10:33:58 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2022-06-22 10:33:58 +0200] notice/ApiListener: Relaying 'event::SetForceNextNotification' message
[2022-06-22 10:33:58 +0200] notice/Notification: Attempting to send notifications of type 'DowntimeStart' for notification object 'test!notify-hosts'.
[2022-06-22 10:33:58 +0200] notice/Notification: Not sending notifications for notification object 'test!notify-hosts': not in timeperiod 'onDuty'
[2022-06-22 10:33:58 +0200] information/Downtime: Triggering downtime 'test!61b2d5f8-9de9-4403-a51d-1c6ca87cf166' for checkable 'test'.
[2022-06-22 10:33:58 +0200] debug/ApiListener: Sent update for object 'test!61b2d5f8-9de9-4403-a51d-1c6ca87cf166': {"config":"object Downtime \"61b2d5f8-9de9-4403-a51d-1c6ca87cf166\" ignore_on_error {\n\tauthor = \"icingaadmin\"\n\tcomment = \"Cluster upgrade maintenance\"\n\tconfig_owner = \"\"\n\tduration = 0.000000\n\tend_time = 1655890437.000000\n\tentry_time = 1655886838.060954\n\tfixed = true\n\thost_name = \"test\"\n\tparent = \"\"\n\tscheduled_by = \"\"\n\tstart_time = 1655886837.000000\n\ttriggered_by = \"\"\n\tversion = 1655886838.061191\n\tzone = \"master\"\n}\n","modified_attributes":{},"name":"test!61b2d5f8-9de9-4403-a51d-1c6ca87cf166","original_attributes":[],"type":"Downtime","version":1655886838.061191,"zone":"master"}

Now, comment out the includes = ["preferred_tp"] attribute of onDuty TimePeriod.

[2022-06-22 10:35:44 +0200] information/Notification: Completed sending 'DowntimeStart' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.
[2022-06-22 10:35:44 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2022-06-22 10:35:44 +0200] notice/ApiListener: Relaying 'event::SetForceNextNotification' message
[2022-06-22 10:35:44 +0200] notice/Notification: Attempting to send notifications of type 'DowntimeStart' for notification object 'test!notify-hosts'.
[2022-06-22 10:35:44 +0200] debug/Notification: Type 'DowntimeStart', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=1, TypeFilter=511)
[2022-06-22 10:35:44 +0200] debug/Notification: User 'john' notification 'test!notify-hosts', Type 'DowntimeStart', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=1, TypeFilter=511)
[2022-06-22 10:35:44 +0200] debug/Notification: User 'john' notification 'test!notify-hosts', State 'Down', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=32, StateFilter=-1)
[2022-06-22 10:35:44 +0200] information/Notification: Sending 'DowntimeStart' notification 'test!notify-hosts' for user 'john'
[2022-06-22 10:35:44 +0200] notice/ApiListener: Relaying 'event::NotificationSentToAllUsers' message
[2022-06-22 10:35:44 +0200] information/Downtime: Triggering downtime 'test!c8039a8c-2824-4e60-b315-5c7a14d175fb' for checkable 'test'.
[2022-06-22 10:35:44 +0200] debug/ApiListener: Sent update for object 'test!c8039a8c-2824-4e60-b315-5c7a14d175fb': {"config":"object Downtime \"c8039a8c-2824-4e60-b315-5c7a14d175fb\" ignore_on_error {\n\tauthor = \"icingaadmin\"\n\tcomment = \"Cluster upgrade maintenance\"\n\tconfig_owner = \"\"\n\tduration = 0.000000\n\tend_time = 1655890543.000000\n\tentry_time = 1655886944.024694\n\tfixed = true\n\thost_name = \"test\"\n\tparent = \"\"\n\tscheduled_by = \"\"\n\tstart_time = 1655886943.000000\n\ttriggered_by = \"\"\n\tversion = 1655886944.024791\n\tzone = \"master\"\n}\n","modified_attributes":{},"name":"test!c8039a8c-2824-4e60-b315-5c7a14d175fb","original_attributes":[],"type":"Downtime","version":1655886944.024791,"zone":"master"}
[2022-06-22 10:35:44 +0200] information/ConfigObjectUtility: Created and activated object 'test!c8039a8c-2824-4e60-b315-5c7a14d175fb' of type 'Downtime'.
[2022-06-22 10:35:44 +0200] notice/ApiListener: Relaying 'config::UpdateObject' message
[2022-06-22 10:35:44 +0200] notice/Process: Running command '/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/scripts/mail-host-notification.sh' '-4' '127.0.0.1' '-6' '' '-b' 'icingaadmin' '-c' 'Cluster upgrade maintenance' '-d' '2022-06-22 10:35:44 +0200' '-l' 'test' '-n' 'test' '-o' 'execvpe(/Users/yhabteab/Workspace/icinga2/prefix/lib/nagios/plugins/check_ping) failed: No such file or directory' '-r' '[email protected]' '-s' 'DOWN' '-t' 'DOWNTIMESTART': PID 70037
[2022-06-22 10:35:44 +0200] notice/WorkQueue: Stopped WorkQueue threads for 'ConfigObjectUtility::CreateObject'
[2022-06-22 10:35:44 +0200] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2022-06-22 10:35:44 +0200] information/Notification: Completed sending 'DowntimeStart' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.

yhabteab avatar Jun 22 '22 08:06 yhabteab

Running tests for https://github.com/Icinga/icinga2/issues/8741: http://10.27.0.228/icingaweb2/dashboard

yhabteab avatar Jun 22 '22 12:06 yhabteab

ref/IP/44756

Al2Klimov avatar Feb 22 '23 14:02 Al2Klimov

The code change looks non-trivial and the referenced issue is quite lengty, so please add a summary of the problem and how this is fixed by this PR to the description.

julianbrost avatar Feb 23 '23 15:02 julianbrost

IMAO Yonas already summarised the problem clearly in #7855 (comment) .

That comment only contains a test. Or do you consider the example configuration to be a sufficient description of the problem?

Apart from that, I can't really follow that test either. Let's consider the time periods on 2022-06-22 (which indeed was a Wednesday) at 09:57:51:

  • offDuty: it's not 10:00-17:00, so it's outside of that time period
  • onDuty: it's inside 00:00-23:59 and it's outside of the excluded offDutytime period, so we are insideonDuty`

And according to the "Before" log, a notification gets sent, which looks correct to me.

Now the "After" log is from 10:33:58, that means the offDuty period was entered, therefore the exclude in onDuty takes effect and no notification is sent, as it's now outside this time period, so also looks correct to me.

For the second "After" test, the time period preferred_tp becomes relevant. It's a time between 09:00-11:00, so we are inside that time period. onDuty now additionally includes this time period so that for the current time, it has both active includes and excludes. prefer_includes is not given explicitly, therefore the default true is used, i.e. we are inside onDuty. And indeed, the notification gets sent, which again looks correct to me.

So there are only correct cases in this test, so it doesn't demonstrate any change. Apart from that, the test is strange because there was a time period transition between the times the old and the new version was tested.

julianbrost avatar Mar 10 '23 16:03 julianbrost

And from pure code design PoV?

Instead of (error prone) merging the timeperiod's own calculated segments (e.g. 1640000000 - 1670000000) with the ones of the included/excluded timeperiods (e.g. 1660000000 - 1690000000) for using the merge result in one IsInside() call, this PR keeps each timeperiod's own calculated segments isolated from each other.

Due to this IsInside() has to call included/excluded timeperiods' IsInside() now.

Al2Klimov avatar Mar 20 '23 10:03 Al2Klimov

@Al2Klimov Please provide proof what has been actually fixed, i.e. the configuration from the Icinga ticket. Also, please verify whether this fixes #8741.

lippserd avatar Mar 30 '23 09:03 lippserd

Let me just describe this short here. In order to avoid time period transitions when testing both branches like last time, I have run this time both in parallel. The configuration I used is the same as in https://github.com/Icinga/icinga2/pull/7855#issuecomment-1162817801, only the time ranges have been adjusted.

  • offDuty: ranges thursday = "10:00-17:00"
  • preferred_tp: ranges thursday = "13:00-15:00"

and the onDuty time period excludes offDuty.

object TimePeriod "onDuty" {
  excludes = ["offDuty"]
  ranges = { thursday = "00:00-23:59" }
}

As a result, there is some strange behaviour in both branches. Sometimes the time period gets excluded correctly and sometimes not.

Master: Where the offDuty period doesn't get excluded.

[2023-03-30 12:50:35 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 12:50:35 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 12:50:35 +0200] information/Notification: Sending 'Problem' notification 'test!notify-hosts' for user 'john'
[2023-03-30 12:50:35 +0200] information/Notification: Completed sending 'Problem' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.

Working Master:

[2023-03-30 12:52:54 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 12:52:54 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 12:52:54 +0200] notice/Notification: Not sending notifications for notification object 'test!notify-hosts': not in timeperiod 'onDuty'

This PR: Where the offDuty period doesn't get excluded.

[2023-03-30 12:50:33 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 12:50:33 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 12:50:33 +0200] information/Notification: Sending 'Problem' notification 'test!notify-hosts' for user 'john'
[2023-03-30 12:50:33 +0200] information/Notification: Completed sending 'Problem' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.

This PR: where the offDuty is correctly applied.

[2023-03-30 12:52:53 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 12:52:53 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 12:52:53 +0200] notice/Notification: Not sending notifications for notification object 'test!notify-hosts': not in timeperiod 'onDuty'

Now, include the preferred_tp into the onDuty time period. includes = ["preferred_tp"]:

Working master:

[2023-03-30 13:08:44 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 13:08:44 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 13:08:44 +0200] information/Notification: Sending 'Problem' notification 'test!notify-hosts' for user 'john'
[2023-03-30 13:08:44 +0200] information/Notification: Completed sending 'Problem' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.

This PR:

[2023-03-30 13:08:45 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 13:08:45 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 13:08:45 +0200] information/Notification: Sending 'Problem' notification 'test!notify-hosts' for user 'john'
[2023-03-30 13:08:45 +0200] information/Notification: Completed sending 'Problem' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.

Conclusion

This PR doesn't fix any problem, but doesn't introduce any different behaviour either. My tests show that the behavior remains exactly the same in both branches and nothing has changed with this PR. Also, I could reproduce this issue in both branches only partially, i.e. sometimes the offDuty period is not considered at all and when deleting the state file and restarting Icinga2 it works fine.

yhabteab avatar Mar 30 '23 11:03 yhabteab

What changed between

Master: Where the offDuty period doesn't get excluded.

and

Working Master:

as well as between

This PR: Where the offDuty period doesn't get excluded.

and

This PR: where the offDuty is correctly applied.

Are you trying to say that both show inconsistent behavior in the same test?

when deleting the state file and restarting Icinga2 it works fine.

Okay, time periods indeed seem to persist some state:

https://github.com/Icinga/icinga2/blob/82f9e4312da50670cf263cdd9f0f2ea4578e3fee/lib/icinga/timeperiod.ti#L33-L35

So looks like you have to be extra careful with tests so that no old and potentially incorrect (or inconsistent with the current implementation) state is used.

julianbrost avatar Mar 31 '23 09:03 julianbrost

What changed between

Master: Where the offDuty period doesn't get excluded.

and

Working Master:

as well as between

This PR: Where the offDuty period doesn't get excluded.

and

This PR: where the offDuty is correctly applied.

Nothing! Just removed the state file and start icinga2 again.

Are you trying to say that both show inconsistent behavior in the same test?

Weren't the tests clear enough? And I'm not trying but I prove it.

yhabteab avatar Apr 03 '23 07:04 yhabteab

OK, you've really almost made me think all this is quality-only.

Actually

Test 1

master, f3b148517

Config

globals.everyday = function(t) {
	return {
		"sunday" = t
		"monday" = t
		"tuesday" = t
		"wednesday" = t
		"thursday" = t
		"friday" = t
		"saturday" = t
	}
}

include <itl>

object CheckerComponent "cc" { }
object NotificationComponent "nc" { }

object TimePeriod "10-18" {
	ranges = everyday("10:00-11:20")
}

object TimePeriod "9-11" {
	ranges = everyday("9:00-11:00")
}

object TimePeriod "0-24" {
	excludes = [ "10-18" ]
	includes = [ "9-11" ]
	ranges = everyday("00:00-24:00")
}

object Host "h" {
	check_command = "dummy"
	vars.dummy_state = 2
	check_interval = 10
	retry_interval = 10
	volatile = true
	check_period = "0-24"
}

object NotificationCommand "nc" {
	command = [ "true" ]
}

object User "u" {
}

object Notification "n" {
	host_name = "h"
	command = "nc"
	users = [ "u" ]
}

Logs

[2023-04-06 11:15:52 +0200] information/ConfigItem: Activated all objects.
[2023-04-06 11:16:02 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:16:02 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:16:02 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
[2023-04-06 11:16:11 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:16:11 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:16:11 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.

Fires too early.

[2023-04-06 11:18:49 +0200] information/ConfigItem: Activated all objects.
[2023-04-06 11:20:04 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:20:04 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:20:04 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
[2023-04-06 11:20:13 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:20:13 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:20:13 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.

Still too early.

Test 2

master (f3b148517) + #7855

Config diff

object TimePeriod "10-18" {
	ranges = { "monday - friday" = "10:00-11:15" }
	ranges = everyday("10:00-11:35")
}

Logs

[2023-04-06 11:32:23 +0200] information/ConfigItem: Activated all objects.
[2023-04-06 11:35:14 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:35:14 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:35:14 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
[2023-04-06 11:35:24 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:35:24 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:35:24 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.

Just in time!

Test 3

master (f3b148517) + #7855

Config diff

object TimePeriod "10-18" {
	ranges = { "monday - friday" = "10:00-11:15" }
	ranges = everyday("10:00-11:40")
}

Logs

[2023-04-06 11:37:15 +0200] information/ConfigItem: Activated all objects.
[2023-04-06 11:40:17 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:40:17 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:40:17 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
[2023-04-06 11:40:27 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:40:27 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:40:27 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.

Just in time!

Conclusion

Let's not crampedly harden on the config(s) in #7398. Those title and description say it all. TimePeriods behave strange. And this PR fixes at least one strange behavior. Let's let several issue creators upgrade to 2.14 and test whether this fixes their stuff as well.

Even if I'm wrong, this is a good code quality thing. Instead of merging anything in everything, every TP manages just its own stuff. Less interferences – less problems.

Al2Klimov avatar Apr 06 '23 09:04 Al2Klimov

Logs

[2023-04-06 11:15:52 +0200] information/ConfigItem: Activated all objects.
[2023-04-06 11:16:02 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:16:02 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:16:02 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
[2023-04-06 11:16:11 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:16:11 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:16:11 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.

Fires too early.

[2023-04-06 11:18:49 +0200] information/ConfigItem: Activated all objects.
[2023-04-06 11:20:04 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:20:04 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:20:04 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
[2023-04-06 11:20:13 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:20:13 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:20:13 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.

Still too early.

What is "Still too early." in the second output? The excluded time period 10-18 ended at 11:20 (great naming!) so after that time, it's inside 0-24 again and the second output looks fine to me.

Also what was the difference between these two runs? Restart and maybe something with the state file?

Config diff

object TimePeriod "10-18" {
	ranges = { "monday - friday" = "10:00-11:15" }
	ranges = everyday("10:00-11:35")
}

Is that missing + and - line prefixes to actually be diff syntax? Or why are you setting ranges here twice (and overwrite the first line)? Also why now everyday(...) and "monday - friday", does this make a difference here?

julianbrost avatar Apr 06 '23 11:04 julianbrost

What is "Still too early." in the second output? The excluded time period 10-18 ended at 11:20 (great naming!) so after that time, it's inside 0-24 again and the second output looks fine to me.

Oh. Sorry, 🐘 oversight.

Also what was the difference between these two runs? Restart and maybe something with the state file?

Restart with untouched state file.

Config diff

object TimePeriod "10-18" {
	ranges = { "monday - friday" = "10:00-11:15" }
	ranges = everyday("10:00-11:35")
}

Is that missing + and - line prefixes to actually be diff syntax? Or why are you setting ranges here twice (and overwrite the first line)? Also why now everyday(...) and "monday - friday", does this make a difference here?

  1. I meant that are the objects I've changed. The difference to the previous config.
  2. Forget the first ranges, they are no-op and shouldn't be there.

Al2Klimov avatar Apr 06 '23 12:04 Al2Klimov

Does that affect your conclusion? I'm still not entirely sure if this is supposed to fix something or just clean up code.

Also what was the difference between these two runs? Restart and maybe something with the state file?

Restart with untouched state file.

And what about Test 2 and Test 3? Did these start with the state file from the previous test?

julianbrost avatar Apr 06 '23 12:04 julianbrost

Does that affect your conclusion? I'm still not entirely sure if this is supposed to fix something or just clean up code.

My conclusion stays. In any case it cleans up code very well. In my opinion it also fixes something as described above. It definitely doesn’t harm.

And what about Test 2 and Test 3? Did these start with the state file from the previous test?

No, with none at all.

Al2Klimov avatar Apr 06 '23 12:04 Al2Klimov

Intermediate result

The only thing changed here -TimePeriod- doesn’t have a getcurrent/next/whatever interface based on the segments, so only IsInside() matters. IMAO no harm possibility.

Al2Klimov avatar Apr 11 '23 13:04 Al2Klimov

@yhabteab Btw. no idea why you and me got different test results with notifications and checks, both(!) use only IsInside(). Maybe yet another rebase will fix your tests...

Al2Klimov avatar Apr 11 '23 13:04 Al2Klimov

I'm very interested for solving #8741. Thatswhy I try to bring focus back on this issue. I have some news which maybe would help for further debug:

This bug affected me already since icinga2-2.12.3-1.el7. With icinga2-2.13.2-1.el7 the bug was gone. On 2. May I updated to icinga2-2.13.3-1.el7 and the bug is back. Yesterday was the 1. May (holiday). Notifications arrived at 2. May in the morning with icinga2-2.13.2-1.el7, right after update to icinga2-2.13.3-1.el7, the notifications are missing again, as it treats 2. May as a holiday again.

@davixd Would you like to test whether this fix solves your specific problem? Here you are your Icinga version plus my three commits:

https://nextcloud.icinga.com/index.php/s/RMLGwtxc4bQsCqC

Al2Klimov avatar Apr 11 '23 15:04 Al2Klimov

@Al2Klimov Sure! Thanks for the work. I have at company a dev environment which is a equivalent to our prod environment. There I will be able to deploy the icinga version and to test it. I will be back with the results as fast as possible.

davixd avatar Apr 11 '23 23:04 davixd

@Al2Klimov What about demonstrating the issue/fix on the TimePeriod level as we discussed (preferably as a unit test)?

julianbrost avatar Apr 12 '23 11:04 julianbrost

I honestly -but abjectly- tried. This is hard-to-catch fish.

BOOST_AUTO_TEST_CASE(ak)
{
	TimePeriod::Ptr tp0900_1100 = new TimePeriod();
	tp0900_1100->SetName("tp0900_1100");
	tp0900_1100->SetRanges(new Dictionary({{ "thursday", "09:00-11:00" }}), true);

	TimePeriod::Ptr tp1000_1120 = new TimePeriod();
	tp1000_1120->SetName("tp1000_1120");
	tp1000_1120->SetRanges(new Dictionary({{ "thursday", "10:00-11:20" }}), true);

	TimePeriod::Ptr tp0000_2400 = new TimePeriod();
	tp0000_2400->SetName("tp0000_2400");
	tp0000_2400->SetRanges(new Dictionary({{ "thursday", "00:00-24:00" }}), true);
	tp0000_2400->SetIncludes(new Array({"tp0900_1100"}), true);
	tp0000_2400->SetExcludes(new Array({"tp1000_1120"}), true);

	TimePeriod::Ptr tps[] = {tp0900_1100, tp1000_1120, tp0000_2400};
	Function::Ptr update = new Function("LegacyTimePeriod", LegacyTimePeriod::ScriptFunc, {"tp", "begin", "end"});

	for (auto& tp : tps) {
		tp->SetUpdate(update, true);
		tp->Register();
	}

	Defer unregister ([&tps]() {
		for (auto& tp : tps) {
			tp->Unregister();
		}
	});

	for (auto& tp : tps) {
		tp->UpdateRegion(make_time_t("2023-04-06 00:00:00"), make_time_t("2023-04-07 00:00:00"), true);
	}

	BOOST_CHECK(!tp0000_2400->IsInside(make_time_t("2023-04-06 11:10:00")));
}

And you said yourself that it's unlikely the customer's problem. So let's cancel our attempt and demount the fishing rod.

Al2Klimov avatar Apr 12 '23 15:04 Al2Klimov

@Al2Klimov

the packages seems not to be updatable with the official repo from Icinga2 for CentOS7:

[18:59:27 INFRA root@monitoring-dev-1-1 7855c7]# ll
total 5020
-rw-r--r--. 1 root root   27932 Apr 11 15:37 icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm
-rw-r--r--. 1 root root 4815424 Apr 11 15:37 icinga2-bin-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm
-rw-r--r--. 1 root root  160456 Apr 11 15:37 icinga2-common-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm
-rw-r--r--. 1 root root  121476 Apr 11 15:37 icinga2-ido-mysql-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm
-rw-r--r--. 1 root root    7204 Apr 11 15:37 vim-icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm
[18:59:28 INFRA root@monitoring-dev-1-1 7855c7]#
[18:57:04 INFRA root@monitoring-dev-1-1 7855c7]# yum localinstall *.rpm
Loaded plugins: fastestmirror, langpacks, rhnplugin, versionlock
This system is receiving updates from RHN Classic or Red Hat Satellite.
Examining icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64
icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: does not update installed package.
Examining icinga2-bin-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: icinga2-bin-2.13.3+3.g5c20af2c3-1.el7.x86_64
icinga2-bin-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: does not update installed package.
Examining icinga2-common-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: icinga2-common-2.13.3+3.g5c20af2c3-1.el7.x86_64
icinga2-common-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: does not update installed package.
Examining icinga2-ido-mysql-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: icinga2-ido-mysql-2.13.3+3.g5c20af2c3-1.el7.x86_64
icinga2-ido-mysql-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: does not update installed package.
Examining vim-icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: vim-icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64
vim-icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: does not update installed package.
Nothing to do

Installed on my CentOS 7.9 dev system is:

[19:00:37 INFRA root@monitoring-dev-1-1 7855c7]# rpm -qa |grep icinga2
icinga2-common-2.13.7-1.el7.x86_64
icinga2-bin-2.13.7-1.el7.x86_64
icinga2-2.13.7-1.el7.x86_64
vim-icinga2-2.13.7-1.el7.x86_64
icinga2-ido-mysql-2.13.7-1.el7.x86_64
[19:00:39 INFRA root@monitoring-dev-1-1 7855c7]#

Any suggestions?

davixd avatar Apr 12 '23 17:04 davixd

What's your current Icinga 2 version?

Al2Klimov avatar Apr 13 '23 07:04 Al2Klimov

icinga2-2.13.7-1.el7.x86_64

davixd avatar Apr 13 '23 08:04 davixd

At best you put all these in a new, persistent directory. Then run createrepo /new/persistent/directory. Finally create /etc/yum.repos.d/NAME_IS_UP_TO_YOU.repo like this:

[NAME_IS_UP_TO_YOU]
baseurl=file:///new/persistent/directory
enabled=1
gpgcheck=0

Now you should be able to upgrade just the icinga2 packages as usual and to undo the upgrade by yum history undo last before you disable that new repo again (if you wish).

Al2Klimov avatar Apr 13 '23 09:04 Al2Klimov

@Al2Klimov Your packages are based on 2.13.3, I don't see why that downgrade should be intended here (and that's why yum rejects them).

julianbrost avatar Apr 13 '23 09:04 julianbrost

I honestly -but abjectly- tried. This is hard-to-catch fish.

What is the code block with the test case trying to show? Does that test pass both with and without this PR?

julianbrost avatar Apr 13 '23 09:04 julianbrost