icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

ConfigItem: Fix infinite recursion caused by `ignore_on_error` when …

Open yhabteab opened this issue 3 years ago • 6 comments
trafficstars

…committing an item

When committing an item with ignore_on_error flag set fails, the Commit() method only returns nullptr and the current item is not being dropped from m_Items. CommittNewItems() also doesn't check the return value of Commit() but just continues and tries to commit all items from m_Items in recursive call. Since this corrupt item is never removed from m_Items, it ends up in an endless recursion till it finally crashes.

Test config

object Host "dummy-check" ignore_on_error {
    address = "localhost"
}

Before

see https://github.com/Icinga/icinga2/issues/8824#issuecomment-949757451

And....:

~/Workspace/icinga2 (master ✗) prefix/sbin/icinga2 daemon -C
[2022-06-20 14:38:46 +0200] information/cli: Icinga application loader (version: v2.13.0-312-g452252244; debug)
[2022-06-20 14:38:46 +0200] information/cli: Loading configuration file(s).
[2022-06-20 14:38:46 +0200] information/ConfigItem: Committing config item(s).
[2022-06-20 14:38:46 +0200] information/ApiListener: My API identity: satellite
[1]    74144 segmentation fault  prefix/sbin/icinga2 daemon -C

Performance

Check this out:

[2022-06-20 12:43:44 +0200] information/WorkQueue: #3 (DaemonUtility::LoadConfigFiles) items: 4, rate: 3.13333/s (188/min 324/5min 324/15min); empty in infinite time, your task handler isn't able to keep up
[2022-06-20 10:58:44 +0200] information/cli: Icinga application loader (version: v2.13.0-312-g452252244; debug)
[2022-06-20 10:58:44 +0200] information/cli: Loading configuration file(s).
[2022-06-20 10:58:50 +0200] information/ConfigItem: Committing config item(s).
[2022-06-20 10:58:50 +0200] information/ApiListener: My API identity: satellite
[2022-06-20 10:59:00 +0200] information/WorkQueue: #3 (DaemonUtility::LoadConfigFiles) items: 0, rate: 2.2/s (132/min 132/5min 132/15min);
[2022-06-20 10:59:00 +0200] information/WorkQueue: #4 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 10:59:00 +0200] information/WorkQueue: #5 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 11:00:10 +0200] information/WorkQueue: #3 (DaemonUtility::LoadConfigFiles) items: 4, rate: 3.2/s (192/min 324/5min 324/15min); empty in 10 seconds
[2022-06-20 11:04:00 +0200] information/WorkQueue: #3 (DaemonUtility::LoadConfigFiles) items: 0, rate:  0/s (0/min 1576/5min 1708/15min);
[2022-06-20 11:04:00 +0200] information/WorkQueue: #5 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 11:04:00 +0200] information/WorkQueue: #4 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 50012 Notifications.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 100011 Services.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 244 CheckCommands.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 User.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 2 Endpoints.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 50001 Hosts.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 2 HostGroups.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 4 Zones.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 3 TimePeriods.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2022-06-20 11:04:26 +0200] information/ScriptGlobal: Dumping variables to file '/Users/yhabteab/CLionProjects/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2022-06-20 11:04:26 +0200] information/cli: Finished validating the configuration file(s).
prefix/sbin/icinga2 daemon -C  504.89s user 18.22s system 150% cpu 5:48.26 total

After

[2022-06-24 10:21:57 +0200] notice/ConfigObject: Ignoring config object 'dummy-check' of type 'Host' due to errors: Error: Validation failed for object 'dummy-check' of type 'Host'; Attribute 'check_command': Attribute must not be empty.
Location: in /Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf: 27:1-27:41
/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf(25): }
/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf(26): 
/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf(27): object Host "dummy-check" ignore_on_error {
                                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf(28):   address = "localhost"
/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf(29): }
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 11 Notifications.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 10 Services.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 244 CheckCommands.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 2 ApiUsers.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 Endpoint.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 Host.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 User.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 3 Zones.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 4 TimePeriods.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-06-24 10:21:57 +0200] information/ScriptGlobal: Dumping variables to file '/Users/yhabteab/Workspace/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2022-06-24 10:21:57 +0200] notice/WorkQueue: Stopped WorkQueue threads for 'DaemonUtility::LoadConfigFiles'
[2022-06-24 10:21:57 +0200] information/cli: Finished validating the configuration file(s)

Performance

 time prefix/sbin/icinga2 daemon -C    
[2022-06-20 10:55:23 +0200] information/cli: Icinga application loader (version: v2.13.0-313-ge8cd42d12; debug)
[2022-06-20 10:55:23 +0200] information/cli: Loading configuration file(s).
[2022-06-20 10:55:29 +0200] information/ConfigItem: Committing config item(s).
[2022-06-20 10:55:29 +0200] information/ApiListener: My API identity: yonass-mbp.fritz.box
[2022-06-20 10:55:39 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 2.33333/s (140/min 140/5min 140/15min);
[2022-06-20 10:55:39 +0200] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 10:55:39 +0200] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 244 CheckCommands.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 2 ApiUsers.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 Endpoint.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 50001 Hosts.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 2 HostGroups.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 User.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 3 Zones.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 4 TimePeriods.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 Downtime.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 50012 Notifications.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 100011 Services.
[2022-06-20 10:58:03 +0200] information/ScriptGlobal: Dumping variables to file '/Users/yhabteab/Workspace/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2022-06-20 10:58:03 +0200] information/cli: Finished validating the configuration file(s).
prefix/sbin/icinga2 daemon -C  444.81s user 9.16s system 271% cpu 2:47.24 total

fixes #8824

yhabteab avatar Jun 20 '22 10:06 yhabteab

Please update the PR description to specify the config that was used in each case and clearly state when a process did not terminate when it should or when it crashed.

Also you have a typo in the PR title and commit message, you might want to fix that as well.

julianbrost avatar Jun 20 '22 11:06 julianbrost

Please update the PR description to specify the config that was used in each case and clearly state when a process did not terminate when it should or when it crashed.

Updated! However, right now it just crashes, though you can check the referenced issue comment to see this repetition over and over again. You can also test the config snippet in notice mode (icinga2 daemon -C -x notice) and you will see the validation error for a few seconds before it crashes.

yhabteab avatar Jun 20 '22 12:06 yhabteab

One thing I don't yet understand: Minimal example to reproduce this seems to be something like object Host "x" ignore_on_error {}, however when doing object Service "x" ignore_on_error {}, things work perfectly fine here. What's the difference between Host and Service here? Both fail the validation due to missing check_command.

julianbrost avatar Jul 14 '22 13:07 julianbrost

The difference between Host and Service objects exists because services are considered "unnamed items" (not because they don't have a name, but at that stage of config loading, the name is not yet known because it depends on the host name which only is known after evaluating the service body). And items are handled differently based on whether they are considered named. If they aren't, committing is tried only once (due to the swap at the end):

https://github.com/Icinga/icinga2/blob/32c7f7730db154ba0dff5856a8985d125791c73e/lib/config/configitem.cpp#L407-L421

So unregistering named items seems to indeed be the correct thing to do here.

julianbrost avatar Jul 20 '22 10:07 julianbrost

Please also share the config you used for your performance test.

prefix/sbin/icinga2 daemon -C 504.89s user 18.22s system 150% cpu 5:48.26 total

prefix/sbin/icinga2 daemon -C 444.81s user 9.16s system 271% cpu 2:47.24 total

I don't think the difference in these numbers is plausible for your change on the newItems vector. After all, the the number of append operations should stay identical (unless you're discarding lots of objects with ignore_on_error) and also unlikely to explain 60s in CPU time difference, so I rather expect there was some strong background noise while you tested this.

The change to newItems also makes a slight difference in behavior: if something is skipped, it is now no longer added there (which sounds desirable).

julianbrost avatar Jul 20 '22 13:07 julianbrost

Please also share the config you used for your performance test.

Config
object CheckCommand "dummy" {
  import "dummy-check-command"

  vars.dummy_state = 0
  vars.dummy_text = "Check was successful."
}

template Host "generic-host" {
  max_check_attempts = 3
  check_interval = 1m
  retry_interval = 30s

  check_command = "dummy"
}

template Service "generic-service" {
  max_check_attempts = 5
  check_interval = 1m
  retry_interval = 30s
}

object Host NodeName {
  import "generic-host"

  address = "127.0.0.1"
  address6 = "::1"

  vars.os = "Linux"
}

for (idx in range(20000)) {
  if ((idx % 2) == 0) {
    var hostname = "icinga-" + string(idx)
  } else {
    var hostname = "dummy-" + string(idx)
  }

  object Host hostname {
    import "generic-host"

    address = "127.0.0.1"

    if (match("*icinga*", this.name)) {
      address = "10.211.55.31"
    }

    vars.is_dummy = true
  }
}

for (i in range(10)) {
  apply Service "random" + i {
    import "generic-service"
    check_command = "dummy"

    assign where true
  }
}

Tests:

Before
~/Workspace/icinga2 (bugfix/handle-ignore-on-error-uses-cases-properly ✗) time prefix/sbin/icinga2 daemon -C
[2022-09-07 11:07:43 +0200] information/cli: Icinga application loader (version: v2.13.0-429-g86b63a57a; debug)
[2022-09-07 11:07:43 +0200] information/cli: Loading configuration file(s).
[2022-09-07 11:07:44 +0200] information/ConfigItem: Committing config item(s).
[2022-09-07 11:07:44 +0200] information/ApiListener: My API identity: yonass-mbp.int.netways.de
[2022-09-07 11:07:54 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 2.2/s (132/min 132/5min 132/15min);
[2022-09-07 11:07:54 +0200] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-09-07 11:07:54 +0200] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 CheckCommand.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 ApiUser.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 Endpoint.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 20001 Hosts.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 3 Zones.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 IcingaDB.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 200010 Services.
[2022-09-07 11:10:52 +0200] information/ScriptGlobal: Dumping variables to file '/Users/yhabteab/Workspace/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2022-09-07 11:10:52 +0200] information/cli: Finished validating the configuration file(s).
prefix/sbin/icinga2 daemon -C  494,60s user 9,12s system 261% cpu 3:12,90 total
After
~/Workspace/icinga2 (bugfix/handle-ignore-on-error-uses-cases-properly ✗) time prefix/sbin/icinga2 daemon -C
[2022-09-07 11:20:49 +0200] information/cli: Icinga application loader (version: v2.13.0-431-g22825535d; debug)
[2022-09-07 11:20:49 +0200] information/cli: Loading configuration file(s).
[2022-09-07 11:20:51 +0200] information/ConfigItem: Committing config item(s).
[2022-09-07 11:20:51 +0200] information/ApiListener: My API identity: yonass-mbp.int.netways.de
[2022-09-07 11:21:01 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 2.2/s (132/min 132/5min 132/15min);
[2022-09-07 11:21:01 +0200] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-09-07 11:21:01 +0200] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 IcingaDB.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 CheckCommand.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 ApiUser.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 Endpoint.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 20001 Hosts.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 3 Zones.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 200010 Services.
[2022-09-07 11:23:39 +0200] information/ScriptGlobal: Dumping variables to file '/Users/yhabteab/Workspace/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2022-09-07 11:23:39 +0200] information/cli: Finished validating the configuration file(s).
prefix/sbin/icinga2 daemon -C  482,10s user 7,73s system 282% cpu 2:53,49 total

yhabteab avatar Sep 07 '22 09:09 yhabteab

By the way, I didn't really consider the claimed performance improvement in the review as this PR isn't about performance and it doesn't look like it makes things slower. But I invite you to try if you can actually reproduce your numbers, i.e. run your test a few more times and see if one is indeed consistently faster.

julianbrost avatar Oct 12 '22 14:10 julianbrost