icinga2
icinga2 copied to clipboard
TimePeriod: consult GetIncludes() and GetExcludes() in IsInside(), not UpdateRegion()
~fixes~ doesn't fix #7398
Edit
The solution
2/3 commits just remove unused methods. 769de8d63cbc8af2e1f48878cfa1ab8a2683df14 is your friend.
Instead of (error prone) merging the timeperiod's own calculated segments (e.g. 1640000000 - 1670000000) with the ones of the included/excluded timeperiods (e.g. 1660000000 - 1690000000) for using the merge result in one IsInside() call, this PR keeps each timeperiod's own calculated segments isolated from each other.
Due to this IsInside() has to call included/excluded timeperiods' IsInside() now.
~The problem~
~IMAO Yonas already summarised the problem clearly in https://github.com/Icinga/icinga2/pull/7855#issuecomment-1162817801 .~
Edit II
"error prone" means that the order of periodically updated timeperiods matter. I.e. ideally the timeperiods used for including/excluding are updated first, then the ones refering to them. In the opposite case you have a problem as the computed ranges of parent_TP are merged with the computed ranges of child_TP before the latter are updated.
@cla-bot check
And maybe it also fixes #8741.
I'm very interested for solving #8741. Thatswhy I try to bring focus back on this issue. I have some news which maybe would help for further debug:
This bug affected me already since icinga2-2.12.3-1.el7. With icinga2-2.13.2-1.el7 the bug was gone. On 2. May I updated to icinga2-2.13.3-1.el7 and the bug is back. Yesterday was the 1. May (holiday). Notifications arrived at 2. May in the morning with icinga2-2.13.2-1.el7, right after update to icinga2-2.13.3-1.el7, the notifications are missing again, as it treats 2. May as a holiday again.
Test cases for https://github.com/Icinga/icinga2/issues/7398
Config
object TimePeriod "24x7" {
ranges = { wednesday = "00:00-23:59" }
}
object TimePeriod "offDuty" {
ranges = { wednesday = "10:00-17:00" }
}
object TimePeriod "preferred_tp" {
ranges = { wednesday = "09:00-11:00" }
}
object TimePeriod "onDuty" {
excludes = ["offDuty"]
//includes = ["preferred_tp"]
ranges = { wednesday = "00:00-23:59" }
}
object User "john" {
email = "[email protected]"
period = "24x7"
}
object Host "test" {
address = "127.0.0.1"
check_command = "hostalive"
}
apply Notification "notify-hosts" to Host {
command = "mail-host-notification"
period = "onDuty"
users = ["john"]
states = [Up, Down]
types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
assign where true
}
Before
[2022-06-22 09:57:51 +0200] information/Notification: Completed sending 'DowntimeStart' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.
[2022-06-22 09:57:51 +0200] notice/Notification: Attempting to send notifications of type 'DowntimeStart' for notification object 'test!notify-hosts'.
[2022-06-22 09:57:51 +0200] debug/Notification: Type 'DowntimeStart', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=1, TypeFilter=511)
[2022-06-22 09:57:51 +0200] debug/Notification: User 'john' notification 'test!notify-hosts', Type 'DowntimeStart', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=1, TypeFilter=511)
[2022-06-22 09:57:51 +0200] debug/Notification: User 'john' notification 'test!notify-hosts', State 'Down', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=32, StateFilter=-1)
[2022-06-22 09:57:51 +0200] information/Notification: Sending 'DowntimeStart' notification 'test!notify-hosts' for user 'john'
[2022-06-22 09:57:51 +0200] notice/ApiListener: Relaying 'event::NotificationSentToAllUsers' message
[2022-06-22 09:57:51 +0200] information/Downtime: Triggering downtime 'test!d8b19bc4-25ed-4a5b-8372-fae70be6473d' for checkable 'test'.
[2022-06-22 09:57:51 +0200] debug/ApiListener: Sent update for object 'test!d8b19bc4-25ed-4a5b-8372-fae70be6473d': {"config":"object Downtime \"d8b19bc4-25ed-4a5b-8372-fae70be6473d\" ignore_on_error {\n\tauthor = \"icingaadmin\"\n\tcomment = \"Cluster upgrade maintenance\"\n\tconfig_owner = \"\"\n\tduration = 0.000000\n\tend_time = 1655888271.000000\n\tentry_time = 1655884671.609968\n\tfixed = true\n\thost_name = \"test\"\n\tparent = \"\"\n\tscheduled_by = \"\"\n\tstart_time = 1655884671.000000\n\ttriggered_by = \"\"\n\tversion = 1655884671.610063\n\tzone = \"master\"\n}\n","modified_attributes":{},"name":"test!d8b19bc4-25ed-4a5b-8372-fae70be6473d","original_attributes":[],"type":"Downtime","version":1655884671.610063,"zone":"master"}
[2022-06-22 09:57:51 +0200] notice/ApiListener: Relaying 'config::UpdateObject' message
[2022-06-22 09:57:51 +0200] information/ConfigObjectUtility: Created and activated object 'test!d8b19bc4-25ed-4a5b-8372-fae70be6473d' of type 'Downtime'.
[2022-06-22 09:57:51 +0200] notice/Process: Running command '/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/scripts/mail-host-notification.sh' '-4' '127.0.0.1' '-6' '' '-b' 'icingaadmin' '-c' 'Cluster upgrade maintenance' '-d' '2022-06-22 09:57:51 +0200' '-l' 'test' '-n' 'test' '-o' 'execvpe(/Users/yhabteab/Workspace/icinga2/prefix/lib/nagios/plugins/check_ping) failed: No such file or directory' '-r' '[email protected]' '-s' 'DOWN' '-t' 'DOWNTIMESTART': PID 48059
[2022-06-22 09:57:51 +0200] notice/WorkQueue: Stopped WorkQueue threads for 'ConfigObjectUtility::CreateObject'
[2022-06-22 09:57:51 +0200] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2022-06-22 09:57:51 +0200] information/Notification: Completed sending 'DowntimeStart' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.
After
[2022-06-22 10:33:58 +0200] notice/Notification: Not sending notifications for notification object 'test!notify-hosts': not in timeperiod 'onDuty'
[2022-06-22 10:33:58 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2022-06-22 10:33:58 +0200] notice/ApiListener: Relaying 'event::SetForceNextNotification' message
[2022-06-22 10:33:58 +0200] notice/Notification: Attempting to send notifications of type 'DowntimeStart' for notification object 'test!notify-hosts'.
[2022-06-22 10:33:58 +0200] notice/Notification: Not sending notifications for notification object 'test!notify-hosts': not in timeperiod 'onDuty'
[2022-06-22 10:33:58 +0200] information/Downtime: Triggering downtime 'test!61b2d5f8-9de9-4403-a51d-1c6ca87cf166' for checkable 'test'.
[2022-06-22 10:33:58 +0200] debug/ApiListener: Sent update for object 'test!61b2d5f8-9de9-4403-a51d-1c6ca87cf166': {"config":"object Downtime \"61b2d5f8-9de9-4403-a51d-1c6ca87cf166\" ignore_on_error {\n\tauthor = \"icingaadmin\"\n\tcomment = \"Cluster upgrade maintenance\"\n\tconfig_owner = \"\"\n\tduration = 0.000000\n\tend_time = 1655890437.000000\n\tentry_time = 1655886838.060954\n\tfixed = true\n\thost_name = \"test\"\n\tparent = \"\"\n\tscheduled_by = \"\"\n\tstart_time = 1655886837.000000\n\ttriggered_by = \"\"\n\tversion = 1655886838.061191\n\tzone = \"master\"\n}\n","modified_attributes":{},"name":"test!61b2d5f8-9de9-4403-a51d-1c6ca87cf166","original_attributes":[],"type":"Downtime","version":1655886838.061191,"zone":"master"}
Now, comment out the includes = ["preferred_tp"]
attribute of onDuty
TimePeriod.
[2022-06-22 10:35:44 +0200] information/Notification: Completed sending 'DowntimeStart' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.
[2022-06-22 10:35:44 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2022-06-22 10:35:44 +0200] notice/ApiListener: Relaying 'event::SetForceNextNotification' message
[2022-06-22 10:35:44 +0200] notice/Notification: Attempting to send notifications of type 'DowntimeStart' for notification object 'test!notify-hosts'.
[2022-06-22 10:35:44 +0200] debug/Notification: Type 'DowntimeStart', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=1, TypeFilter=511)
[2022-06-22 10:35:44 +0200] debug/Notification: User 'john' notification 'test!notify-hosts', Type 'DowntimeStart', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=1, TypeFilter=511)
[2022-06-22 10:35:44 +0200] debug/Notification: User 'john' notification 'test!notify-hosts', State 'Down', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=32, StateFilter=-1)
[2022-06-22 10:35:44 +0200] information/Notification: Sending 'DowntimeStart' notification 'test!notify-hosts' for user 'john'
[2022-06-22 10:35:44 +0200] notice/ApiListener: Relaying 'event::NotificationSentToAllUsers' message
[2022-06-22 10:35:44 +0200] information/Downtime: Triggering downtime 'test!c8039a8c-2824-4e60-b315-5c7a14d175fb' for checkable 'test'.
[2022-06-22 10:35:44 +0200] debug/ApiListener: Sent update for object 'test!c8039a8c-2824-4e60-b315-5c7a14d175fb': {"config":"object Downtime \"c8039a8c-2824-4e60-b315-5c7a14d175fb\" ignore_on_error {\n\tauthor = \"icingaadmin\"\n\tcomment = \"Cluster upgrade maintenance\"\n\tconfig_owner = \"\"\n\tduration = 0.000000\n\tend_time = 1655890543.000000\n\tentry_time = 1655886944.024694\n\tfixed = true\n\thost_name = \"test\"\n\tparent = \"\"\n\tscheduled_by = \"\"\n\tstart_time = 1655886943.000000\n\ttriggered_by = \"\"\n\tversion = 1655886944.024791\n\tzone = \"master\"\n}\n","modified_attributes":{},"name":"test!c8039a8c-2824-4e60-b315-5c7a14d175fb","original_attributes":[],"type":"Downtime","version":1655886944.024791,"zone":"master"}
[2022-06-22 10:35:44 +0200] information/ConfigObjectUtility: Created and activated object 'test!c8039a8c-2824-4e60-b315-5c7a14d175fb' of type 'Downtime'.
[2022-06-22 10:35:44 +0200] notice/ApiListener: Relaying 'config::UpdateObject' message
[2022-06-22 10:35:44 +0200] notice/Process: Running command '/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/scripts/mail-host-notification.sh' '-4' '127.0.0.1' '-6' '' '-b' 'icingaadmin' '-c' 'Cluster upgrade maintenance' '-d' '2022-06-22 10:35:44 +0200' '-l' 'test' '-n' 'test' '-o' 'execvpe(/Users/yhabteab/Workspace/icinga2/prefix/lib/nagios/plugins/check_ping) failed: No such file or directory' '-r' '[email protected]' '-s' 'DOWN' '-t' 'DOWNTIMESTART': PID 70037
[2022-06-22 10:35:44 +0200] notice/WorkQueue: Stopped WorkQueue threads for 'ConfigObjectUtility::CreateObject'
[2022-06-22 10:35:44 +0200] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2022-06-22 10:35:44 +0200] information/Notification: Completed sending 'DowntimeStart' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.
Running tests for https://github.com/Icinga/icinga2/issues/8741: http://10.27.0.228/icingaweb2/dashboard
ref/IP/44756
The code change looks non-trivial and the referenced issue is quite lengty, so please add a summary of the problem and how this is fixed by this PR to the description.
IMAO Yonas already summarised the problem clearly in #7855 (comment) .
That comment only contains a test. Or do you consider the example configuration to be a sufficient description of the problem?
Apart from that, I can't really follow that test either. Let's consider the time periods on 2022-06-22 (which indeed was a Wednesday) at 09:57:51:
-
offDuty
: it's not 10:00-17:00, so it's outside of that time period -
onDuty: it's inside 00:00-23:59 and it's outside of the excluded
offDutytime period, so we are inside
onDuty`
And according to the "Before" log, a notification gets sent, which looks correct to me.
Now the "After" log is from 10:33:58, that means the offDuty
period was entered, therefore the exclude in onDuty
takes effect and no notification is sent, as it's now outside this time period, so also looks correct to me.
For the second "After" test, the time period preferred_tp
becomes relevant. It's a time between 09:00-11:00, so we are inside that time period. onDuty
now additionally includes this time period so that for the current time, it has both active includes and excludes. prefer_includes
is not given explicitly, therefore the default true
is used, i.e. we are inside onDuty
. And indeed, the notification gets sent, which again looks correct to me.
So there are only correct cases in this test, so it doesn't demonstrate any change. Apart from that, the test is strange because there was a time period transition between the times the old and the new version was tested.
And from pure code design PoV?
Instead of (error prone) merging the timeperiod's own calculated segments (e.g. 1640000000 - 1670000000) with the ones of the included/excluded timeperiods (e.g. 1660000000 - 1690000000) for using the merge result in one IsInside() call, this PR keeps each timeperiod's own calculated segments isolated from each other.
Due to this IsInside() has to call included/excluded timeperiods' IsInside() now.
@Al2Klimov Please provide proof what has been actually fixed, i.e. the configuration from the Icinga ticket. Also, please verify whether this fixes #8741.
Let me just describe this short here. In order to avoid time period transitions when testing both branches like last time, I have run this time both in parallel. The configuration I used is the same as in https://github.com/Icinga/icinga2/pull/7855#issuecomment-1162817801, only the time ranges have been adjusted.
-
offDuty: ranges
thursday = "10:00-17:00"
-
preferred_tp: ranges
thursday = "13:00-15:00"
and the onDuty time period excludes offDuty.
object TimePeriod "onDuty" {
excludes = ["offDuty"]
ranges = { thursday = "00:00-23:59" }
}
As a result, there is some strange behaviour in both branches. Sometimes the time period gets excluded correctly and sometimes not.
Master: Where the offDuty period doesn't get excluded.
[2023-03-30 12:50:35 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 12:50:35 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 12:50:35 +0200] information/Notification: Sending 'Problem' notification 'test!notify-hosts' for user 'john'
[2023-03-30 12:50:35 +0200] information/Notification: Completed sending 'Problem' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.
Working Master:
[2023-03-30 12:52:54 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 12:52:54 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 12:52:54 +0200] notice/Notification: Not sending notifications for notification object 'test!notify-hosts': not in timeperiod 'onDuty'
This PR: Where the offDuty period doesn't get excluded.
[2023-03-30 12:50:33 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 12:50:33 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 12:50:33 +0200] information/Notification: Sending 'Problem' notification 'test!notify-hosts' for user 'john'
[2023-03-30 12:50:33 +0200] information/Notification: Completed sending 'Problem' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.
This PR: where the offDuty is correctly applied.
[2023-03-30 12:52:53 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 12:52:53 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 12:52:53 +0200] notice/Notification: Not sending notifications for notification object 'test!notify-hosts': not in timeperiod 'onDuty'
Now, include the preferred_tp into the onDuty time period. includes = ["preferred_tp"]
:
Working master:
[2023-03-30 13:08:44 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 13:08:44 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 13:08:44 +0200] information/Notification: Sending 'Problem' notification 'test!notify-hosts' for user 'john'
[2023-03-30 13:08:44 +0200] information/Notification: Completed sending 'Problem' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.
This PR:
[2023-03-30 13:08:45 +0200] information/Checkable: Checkable 'test' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-03-30 13:08:45 +0200] notice/Notification: Attempting to send notifications of type 'Problem' for notification object 'test!notify-hosts'.
[2023-03-30 13:08:45 +0200] information/Notification: Sending 'Problem' notification 'test!notify-hosts' for user 'john'
[2023-03-30 13:08:45 +0200] information/Notification: Completed sending 'Problem' notification 'test!notify-hosts' for checkable 'test' and user 'john' using command 'mail-host-notification'.
Conclusion
This PR doesn't fix any problem, but doesn't introduce any different behaviour either. My tests show that the behavior remains exactly the same in both branches and nothing has changed with this PR. Also, I could reproduce this issue in both branches only partially, i.e. sometimes the offDuty period is not considered at all and when deleting the state file and restarting Icinga2 it works fine.
What changed between
Master: Where the offDuty period doesn't get excluded.
and
Working Master:
as well as between
This PR: Where the offDuty period doesn't get excluded.
and
This PR: where the offDuty is correctly applied.
Are you trying to say that both show inconsistent behavior in the same test?
when deleting the state file and restarting Icinga2 it works fine.
Okay, time periods indeed seem to persist some state:
https://github.com/Icinga/icinga2/blob/82f9e4312da50670cf263cdd9f0f2ea4578e3fee/lib/icinga/timeperiod.ti#L33-L35
So looks like you have to be extra careful with tests so that no old and potentially incorrect (or inconsistent with the current implementation) state is used.
What changed between
Master: Where the offDuty period doesn't get excluded.
and
Working Master:
as well as between
This PR: Where the offDuty period doesn't get excluded.
and
This PR: where the offDuty is correctly applied.
Nothing! Just removed the state file and start icinga2 again.
Are you trying to say that both show inconsistent behavior in the same test?
Weren't the tests clear enough? And I'm not trying but I prove it.
OK, you've really almost made me think all this is quality-only.
Actually
Test 1
master, f3b148517
Config
globals.everyday = function(t) {
return {
"sunday" = t
"monday" = t
"tuesday" = t
"wednesday" = t
"thursday" = t
"friday" = t
"saturday" = t
}
}
include <itl>
object CheckerComponent "cc" { }
object NotificationComponent "nc" { }
object TimePeriod "10-18" {
ranges = everyday("10:00-11:20")
}
object TimePeriod "9-11" {
ranges = everyday("9:00-11:00")
}
object TimePeriod "0-24" {
excludes = [ "10-18" ]
includes = [ "9-11" ]
ranges = everyday("00:00-24:00")
}
object Host "h" {
check_command = "dummy"
vars.dummy_state = 2
check_interval = 10
retry_interval = 10
volatile = true
check_period = "0-24"
}
object NotificationCommand "nc" {
command = [ "true" ]
}
object User "u" {
}
object Notification "n" {
host_name = "h"
command = "nc"
users = [ "u" ]
}
Logs
[2023-04-06 11:15:52 +0200] information/ConfigItem: Activated all objects.
[2023-04-06 11:16:02 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:16:02 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:16:02 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
[2023-04-06 11:16:11 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:16:11 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:16:11 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
Fires too early.
[2023-04-06 11:18:49 +0200] information/ConfigItem: Activated all objects.
[2023-04-06 11:20:04 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:20:04 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:20:04 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
[2023-04-06 11:20:13 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:20:13 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:20:13 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
Still too early.
Test 2
master (f3b148517) + #7855
Config diff
object TimePeriod "10-18" {
ranges = { "monday - friday" = "10:00-11:15" }
ranges = everyday("10:00-11:35")
}
Logs
[2023-04-06 11:32:23 +0200] information/ConfigItem: Activated all objects.
[2023-04-06 11:35:14 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:35:14 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:35:14 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
[2023-04-06 11:35:24 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:35:24 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:35:24 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
Just in time!
Test 3
master (f3b148517) + #7855
Config diff
object TimePeriod "10-18" {
ranges = { "monday - friday" = "10:00-11:15" }
ranges = everyday("10:00-11:40")
}
Logs
[2023-04-06 11:37:15 +0200] information/ConfigItem: Activated all objects.
[2023-04-06 11:40:17 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:40:17 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:40:17 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
[2023-04-06 11:40:27 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2023-04-06 11:40:27 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u'
[2023-04-06 11:40:27 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
Just in time!
Conclusion
Let's not crampedly harden on the config(s) in #7398. Those title and description say it all. TimePeriods behave strange. And this PR fixes at least one strange behavior. Let's let several issue creators upgrade to 2.14 and test whether this fixes their stuff as well.
Even if I'm wrong, this is a good code quality thing. Instead of merging anything in everything, every TP manages just its own stuff. Less interferences – less problems.
Logs
[2023-04-06 11:15:52 +0200] information/ConfigItem: Activated all objects. [2023-04-06 11:16:02 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged. [2023-04-06 11:16:02 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u' [2023-04-06 11:16:02 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'. [2023-04-06 11:16:11 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged. [2023-04-06 11:16:11 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u' [2023-04-06 11:16:11 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
Fires too early.
[2023-04-06 11:18:49 +0200] information/ConfigItem: Activated all objects. [2023-04-06 11:20:04 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged. [2023-04-06 11:20:04 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u' [2023-04-06 11:20:04 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'. [2023-04-06 11:20:13 +0200] information/Checkable: Checkable 'h' has 1 notification(s). Checking filters for type 'Problem', sends will be logged. [2023-04-06 11:20:13 +0200] information/Notification: Sending 'Problem' notification 'h!n' for user 'u' [2023-04-06 11:20:13 +0200] information/Notification: Completed sending 'Problem' notification 'h!n' for checkable 'h' and user 'u' using command 'nc'.
Still too early.
What is "Still too early." in the second output? The excluded time period 10-18 ended at 11:20 (great naming!) so after that time, it's inside 0-24 again and the second output looks fine to me.
Also what was the difference between these two runs? Restart and maybe something with the state file?
Config diff
object TimePeriod "10-18" { ranges = { "monday - friday" = "10:00-11:15" } ranges = everyday("10:00-11:35") }
Is that missing +
and -
line prefixes to actually be diff syntax? Or why are you setting ranges
here twice (and overwrite the first line)? Also why now everyday(...)
and "monday - friday"
, does this make a difference here?
What is "Still too early." in the second output? The excluded time period 10-18 ended at 11:20 (great naming!) so after that time, it's inside 0-24 again and the second output looks fine to me.
Oh. Sorry, 🐘 oversight.
Also what was the difference between these two runs? Restart and maybe something with the state file?
Restart with untouched state file.
Config diff
object TimePeriod "10-18" { ranges = { "monday - friday" = "10:00-11:15" } ranges = everyday("10:00-11:35") }
Is that missing
+
and-
line prefixes to actually be diff syntax? Or why are you settingranges
here twice (and overwrite the first line)? Also why noweveryday(...)
and"monday - friday"
, does this make a difference here?
- I meant that are the objects I've changed. The difference to the previous config.
- Forget the first ranges, they are no-op and shouldn't be there.
Does that affect your conclusion? I'm still not entirely sure if this is supposed to fix something or just clean up code.
Also what was the difference between these two runs? Restart and maybe something with the state file?
Restart with untouched state file.
And what about Test 2 and Test 3? Did these start with the state file from the previous test?
Does that affect your conclusion? I'm still not entirely sure if this is supposed to fix something or just clean up code.
My conclusion stays. In any case it cleans up code very well. In my opinion it also fixes something as described above. It definitely doesn’t harm.
And what about Test 2 and Test 3? Did these start with the state file from the previous test?
No, with none at all.
Intermediate result
The only thing changed here -TimePeriod- doesn’t have a getcurrent/next/whatever interface based on the segments, so only IsInside() matters. IMAO no harm possibility.
@yhabteab Btw. no idea why you and me got different test results with notifications and checks, both(!) use only IsInside(). Maybe yet another rebase will fix your tests...
I'm very interested for solving #8741. Thatswhy I try to bring focus back on this issue. I have some news which maybe would help for further debug:
This bug affected me already since icinga2-2.12.3-1.el7. With icinga2-2.13.2-1.el7 the bug was gone. On 2. May I updated to icinga2-2.13.3-1.el7 and the bug is back. Yesterday was the 1. May (holiday). Notifications arrived at 2. May in the morning with icinga2-2.13.2-1.el7, right after update to icinga2-2.13.3-1.el7, the notifications are missing again, as it treats 2. May as a holiday again.
@davixd Would you like to test whether this fix solves your specific problem? Here you are your Icinga version plus my three commits:
https://nextcloud.icinga.com/index.php/s/RMLGwtxc4bQsCqC
@Al2Klimov Sure! Thanks for the work. I have at company a dev environment which is a equivalent to our prod environment. There I will be able to deploy the icinga version and to test it. I will be back with the results as fast as possible.
@Al2Klimov What about demonstrating the issue/fix on the TimePeriod level as we discussed (preferably as a unit test)?
I honestly -but abjectly- tried. This is hard-to-catch fish.
BOOST_AUTO_TEST_CASE(ak)
{
TimePeriod::Ptr tp0900_1100 = new TimePeriod();
tp0900_1100->SetName("tp0900_1100");
tp0900_1100->SetRanges(new Dictionary({{ "thursday", "09:00-11:00" }}), true);
TimePeriod::Ptr tp1000_1120 = new TimePeriod();
tp1000_1120->SetName("tp1000_1120");
tp1000_1120->SetRanges(new Dictionary({{ "thursday", "10:00-11:20" }}), true);
TimePeriod::Ptr tp0000_2400 = new TimePeriod();
tp0000_2400->SetName("tp0000_2400");
tp0000_2400->SetRanges(new Dictionary({{ "thursday", "00:00-24:00" }}), true);
tp0000_2400->SetIncludes(new Array({"tp0900_1100"}), true);
tp0000_2400->SetExcludes(new Array({"tp1000_1120"}), true);
TimePeriod::Ptr tps[] = {tp0900_1100, tp1000_1120, tp0000_2400};
Function::Ptr update = new Function("LegacyTimePeriod", LegacyTimePeriod::ScriptFunc, {"tp", "begin", "end"});
for (auto& tp : tps) {
tp->SetUpdate(update, true);
tp->Register();
}
Defer unregister ([&tps]() {
for (auto& tp : tps) {
tp->Unregister();
}
});
for (auto& tp : tps) {
tp->UpdateRegion(make_time_t("2023-04-06 00:00:00"), make_time_t("2023-04-07 00:00:00"), true);
}
BOOST_CHECK(!tp0000_2400->IsInside(make_time_t("2023-04-06 11:10:00")));
}
And you said yourself that it's unlikely the customer's problem. So let's cancel our attempt and demount the fishing rod.
@Al2Klimov
the packages seems not to be updatable with the official repo from Icinga2 for CentOS7:
[18:59:27 INFRA root@monitoring-dev-1-1 7855c7]# ll
total 5020
-rw-r--r--. 1 root root 27932 Apr 11 15:37 icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm
-rw-r--r--. 1 root root 4815424 Apr 11 15:37 icinga2-bin-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm
-rw-r--r--. 1 root root 160456 Apr 11 15:37 icinga2-common-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm
-rw-r--r--. 1 root root 121476 Apr 11 15:37 icinga2-ido-mysql-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm
-rw-r--r--. 1 root root 7204 Apr 11 15:37 vim-icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm
[18:59:28 INFRA root@monitoring-dev-1-1 7855c7]#
[18:57:04 INFRA root@monitoring-dev-1-1 7855c7]# yum localinstall *.rpm
Loaded plugins: fastestmirror, langpacks, rhnplugin, versionlock
This system is receiving updates from RHN Classic or Red Hat Satellite.
Examining icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64
icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: does not update installed package.
Examining icinga2-bin-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: icinga2-bin-2.13.3+3.g5c20af2c3-1.el7.x86_64
icinga2-bin-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: does not update installed package.
Examining icinga2-common-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: icinga2-common-2.13.3+3.g5c20af2c3-1.el7.x86_64
icinga2-common-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: does not update installed package.
Examining icinga2-ido-mysql-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: icinga2-ido-mysql-2.13.3+3.g5c20af2c3-1.el7.x86_64
icinga2-ido-mysql-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: does not update installed package.
Examining vim-icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: vim-icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64
vim-icinga2-2.13.3+3.g5c20af2c3-1.el7.x86_64.rpm: does not update installed package.
Nothing to do
Installed on my CentOS 7.9 dev system is:
[19:00:37 INFRA root@monitoring-dev-1-1 7855c7]# rpm -qa |grep icinga2
icinga2-common-2.13.7-1.el7.x86_64
icinga2-bin-2.13.7-1.el7.x86_64
icinga2-2.13.7-1.el7.x86_64
vim-icinga2-2.13.7-1.el7.x86_64
icinga2-ido-mysql-2.13.7-1.el7.x86_64
[19:00:39 INFRA root@monitoring-dev-1-1 7855c7]#
Any suggestions?
What's your current Icinga 2 version?
icinga2-2.13.7-1.el7.x86_64
At best you put all these in a new, persistent directory. Then run createrepo /new/persistent/directory. Finally create /etc/yum.repos.d/NAME_IS_UP_TO_YOU.repo like this:
[NAME_IS_UP_TO_YOU]
baseurl=file:///new/persistent/directory
enabled=1
gpgcheck=0
Now you should be able to upgrade just the icinga2 packages as usual and to undo the upgrade by yum history undo last before you disable that new repo again (if you wish).
@Al2Klimov Your packages are based on 2.13.3, I don't see why that downgrade should be intended here (and that's why yum rejects them).
I honestly -but abjectly- tried. This is hard-to-catch fish.
What is the code block with the test case trying to show? Does that test pass both with and without this PR?