icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

Icinga 2.14.0 not longer synchronizes modified attributes to satellites (regression from 2.13.7)

Open jan-kantert opened this issue 2 years ago • 5 comments

Describe the bug

Since we upgraded to 2.14.0 we experince issues with our satellites where "old" values reappear. It seems as all changes in modified-attributes.conf are no longer synchronized to the satellites. Services on satellite use values as specified in original_attributes in IcingaWeb2 (even though the correct ones are shown in the UI). As a consequence checks use incorrect/old vars.

This happens with existing/upgraded satellites. It also occurs on new satellites. We create all our services via API so this happens quite frequently for us and very reproducibly.

We verified that all files in /var/lib/icinga2/api are the same on master and satellite. Files in /var/lib/icinga2/api do not contain any modified attributes (neither on master nor on the satellites). Modified attributes can be found in /var/lib/icinga2/modified-attributes.conf on the master only.

This worked fine in 2.13.7 (did not try 2.13.8 yet). In 2.13.7 modified attributes would be merged into the files in /var/lib/icinga2/api on the satellite.

To Reproduce

Provide a link to a live example, or an unambiguous set of steps to reproduce this bug. Include configuration, logs, etc. to reproduce, if relevant.

  1. Install master
  2. Add service via API
  3. Change service via API (i.e. change vars)
  4. Add satellite
  5. Observe check execution

Satellite will use the vars from step (2) but not from step (3).

Expected behavior

Satellite will use changed vars from step (3).

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version):
$ icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: v2.14.0)

Copyright (c) 2012-2023 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Debian GNU/Linux
  Platform version: 11 (bullseye)
  Kernel: Linux
  Kernel version: 6.2.0-1011-aws
  Architecture: x86_64

Build information:
  Compiler: GNU 10.2.1
  Build host: buildkitsandbox
  OpenSSL version: OpenSSL 1.1.1n  15 Mar 2022

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid
  • Operating System and version: Debian bullseye (from official docker image - see above)
  • Enabled features (icinga2 feature list):
    • Satellites: Enabled features: api checker notification syslog
    • Master: Enabled features: api checker icingadb notification syslog
  • Icinga Web 2 version and modules (System - About):
  • Config validation (icinga2 daemon -C): On master:
$ icinga2 daemon -C
[2023-09-27 20:25:58 +0000] information/cli: Icinga application loader (version: v2.14.0)
[2023-09-27 20:25:58 +0000] information/cli: Loading configuration file(s).
[2023-09-27 20:26:00 +0000] information/ConfigItem: Committing config item(s).
[2023-09-27 20:26:00 +0000] information/ApiListener: My API identity: xxx
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 1 SyslogLogger.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 1 NotificationComponent.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 6762 Dependencies.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 35 Comments.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 71 Users.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 1 UserGroup.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 3 TimePeriods.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 405 ServiceGroups.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 8214 Services.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 14 Zones.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 3 NotificationCommands.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 8215 Notifications.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 12 Hosts.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 2 HostGroups.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 12 Endpoints.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 1771 Downtimes.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 16 ApiUsers.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 263 CheckCommands.
[2023-09-27 20:26:01 +0000] information/ConfigItem: Instantiated 1 IcingaDB.
[2023-09-27 20:26:01 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2023-09-27 20:26:01 +0000] information/cli: Finished validating the configuration file(s).

On a fresh satellite y:

$ icinga2 daemon -C
[2023-09-27 20:23:57 +0000] information/cli: Icinga application loader (version: v2.14.0)
[2023-09-27 20:23:57 +0000] information/cli: Loading configuration file(s).
[2023-09-27 20:23:57 +0000] information/ConfigItem: Committing config item(s).
[2023-09-27 20:23:57 +0000] information/ApiListener: My API identity: yyy
[2023-09-27 20:23:58 +0000] warning/ApplyRule: Apply rule 'backup-downtime' (in /etc/icinga2/conf.d/downtimes.conf: 5:1-5:52) for type 'ScheduledDowntime' does not match anywhere!
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 1 SyslogLogger.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 1641 Dependencies.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 35 Users.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 1 UserGroup.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 3 TimePeriods.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 112 ServiceGroups.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 1757 Services.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 4 Zones.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 3 NotificationCommands.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 3511 Notifications.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 1 Host.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 2 HostGroups.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 2 Endpoints.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 476 Downtimes.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2023-09-27 20:23:58 +0000] information/ConfigItem: Instantiated 263 CheckCommands.
[2023-09-27 20:23:58 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2023-09-27 20:23:58 +0000] information/cli: Finished validating the configuration file(s).
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.
$ icinga2 object list --type Endpoint

Object 'xxx of type 'Endpoint':
  % declared in '/etc/icinga2/zones.conf', lines 11:1-11:61
  * __name = "xxx"
  * host = "xxx"
    % = modified in '/etc/icinga2/zones.conf', lines 12:3-12:54
  * log_duration = 86400
  * name = "xxx"
  * package = "_etc"
  * port = "5665"
    % = modified in '/etc/icinga2/zones.conf', lines 13:3-13:15
  * source_location
    * first_column = 1
    * first_line = 11
    * last_column = 61
    * last_line = 11
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "xxx" ]
    % = modified in '/etc/icinga2/zones.conf', lines 11:1-11:61
  * type = "Endpoint"
  * zone = "master"
    % = modified in '/etc/icinga2/zones.conf', lines 14:3-14:17

Object 'yyy' of type 'Endpoint':
  % declared in '/etc/icinga2/conf.d/satellite-config/y.conf', lines 5:1-5:45
  * __name = "yyy"
  * host = "yyy"
    % = modified in '/etc/icinga2/conf.d/satellite-config/y.conf', lines 6:3-6:38
  * log_duration = 86400
    % = modified in '/etc/icinga2/conf.d/satellite-config/y.conf', lines 8:3-8:19
  * name = "yyy"
  * package = "_etc"
  * port = "5665"
    % = modified in '/etc/icinga2/conf.d/satellite-config/y.conf', lines 7:3-7:13
  * source_location
    * first_column = 1
    * first_line = 5
    * last_column = 45
    * last_line = 5
    * path = "/etc/icinga2/conf.d/satellite-config/y.conf"
  * templates = [ "yyy" ]
    % = modified in '/etc/icinga2/conf.d/satellite-config/y.conf', lines 5:1-5:45
  * type = "Endpoint"
  * zone = ""
$ icinga2 object list --type Zone

Object 'master' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 7:1-7:20
  * __name = "master"
  * endpoints = [ "xxx" ]
    % = modified in '/etc/icinga2/zones.conf', lines 8:3-8:63
  * global = false
  * name = "master"
  * package = "_etc"
  * parent = ""
  * source_location
    * first_column = 1
    * first_line = 7
    * last_column = 20
    * last_line = 7
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "master" ]
    % = modified in '/etc/icinga2/zones.conf', lines 7:1-7:20
  * type = "Zone"
  * zone = ""

Object 'y-satellite' of type 'Zone':
  % declared in '/etc/icinga2/conf.d/satellite-config/y.conf', lines 1:0-1:28
  * __name = "y-satellite"
  * endpoints = [ "yyy" ]
    % = modified in '/etc/icinga2/conf.d/satellite-config/y.conf', lines 2:3-2:47
  * global = false
  * name = "y-satellite"
  * package = "_etc"
  * parent = "master"
    % = modified in '/etc/icinga2/conf.d/satellite-config/y.conf', lines 3:3-3:19
  * source_location
    * first_column = 0
    * first_line = 1
    * last_column = 28
    * last_line = 1
    * path = "/etc/icinga2/conf.d/satellite-config/y.conf"
  * templates = [ "y-satellite" ]
    % = modified in '/etc/icinga2/conf.d/satellite-config/y.conf', lines 1:0-1:28
  * type = "Zone"
  * zone = ""

Additional context

We run Icinga on top of Kubernetes with an operator to create checks/services. Setup is fully automated so nothing has been tampered manually.

After downgrading back to 2.13.7 we found this:

  • /var/lib/icinga2/modified-attributes.conf contains 0 bytes on the satellite
  • API changes are directly reflected in synchronized filed
  • I could not find anything in the changelog which indicates a changed synchronization behaviour
  • Starting satellites and master is much faster in 2.14.0 compared to 2.13.7 (30s vs 2 minutes)

jan-kantert avatar Sep 27 '23 20:09 jan-kantert

Reproduced even with v2.13.0:

  • Patch https://github.com/Al2Klimov/twintowers like below
  • Start it
  • Stop "master" 2
  • curl -sSiku root:icinga -X PUT -H 'Accept: application/json' 'https://127.0.0.1:5661/v1/objects/hosts/h' -d '{"pretty":1,"attrs":{"check_command":"dummy","zone":"master2"}}'
  • curl -sSiku root:icinga -X PUT -H 'Accept: application/json' 'https://127.0.0.1:5661/v1/objects/services/h!s' -d '{"pretty":1,"attrs":{"check_command":"dummy","zone":"master2","vars.dummy_text":"WRONG"}}'
  • curl -sSiku root:icinga -X POST -H 'Accept: application/json' 'https://127.0.0.1:5661/v1/objects/services/h!s' -d '{"pretty":1,"attrs":{"vars.dummy_text":"RIGHT"}}'
  • Start "master" 2 again
  • Check says WRONG (as it's run on "master" 2) despite dummy_text is RIGHT
--- icinga2.conf
+++ icinga2.conf
@@ -17,8 +17,13 @@ object ApiUser "root" {
 object Endpoint "master1" { host = "master1" }
 object Endpoint "master2" { host = "master2" }

-object Zone "master" {
-       endpoints = [ "master1", "master2" ]
+object Zone "master1" {
+       endpoints = [ "master1" ]
+}
+
+object Zone "master2" {
+       endpoints = [ "master2" ]
+       parent = "master1"
 }

 object Zone "global-templates" { global = true }
--- docker-compose.yml
+++ docker-compose.yml
@@ -142,7 +142,9 @@ services:
       redis1:
         condition: service_started
     hostname: master1
-    image: icinga/icinga2
+    image: icinga/icinga2:2.14.0
+    ports:
+      - 5661:5665
     volumes:
       - ./volumes/icinga2/master1:/data
       - ./icinga2.conf:/data/etc/icinga2/icinga2.conf:ro
@@ -235,7 +237,9 @@ services:
       redis2:
         condition: service_started
     hostname: master2
-    image: icinga/icinga2
+    image: icinga/icinga2:2.14.0
+    ports:
+      - 5662:5665
     volumes:
       - ./volumes/icinga2/master2:/data
       - ./icinga2.conf:/data/etc/icinga2/icinga2.conf:ro

Al2Klimov avatar Sep 28 '23 11:09 Al2Klimov

Reproduced even with v2.13.0:

Do you mean v2.14.0 or v2.13.0? Because this seems to work fine with v2.13.7 for me. I will try v2.13.8 next week to narrow it down further.

jan-kantert avatar Sep 29 '23 07:09 jan-kantert

@Al2Klimov any idea how I could help to find the root cause here? I did not find anything in the diff but I also do not know the codebase well.

jan-kantert avatar Oct 25 '23 14:10 jan-kantert

First of all, does https://github.com/Icinga/icinga2/issues/9865#issuecomment-1738948072 match what you consider the problem?

Al2Klimov avatar Oct 25 '23 14:10 Al2Klimov

@Al2Klimov I tested your scenario above and yes that seems to be our issue.

In the meantime we also tested v2.13.8 and it seems to also make it worse somehow. We see those issues on our satellites. Deleting the satellite and resyncing all config fixes the issue for some time but it will reappear.

Difference between v2.13.8 and 2.14.0 seems be that on resync it recoveres on v2.13.x but it stays broken (or becomes worse) on v2.14.0 which seems to miss modified-attributes.conf on initial sync as well. So https://github.com/Icinga/icinga2/issues/9865#issuecomment-1738948072 matches the issue we see in v2.13.x. However, there might be an additional issue in v2.14.0.

jan-kantert avatar Nov 17 '23 17:11 jan-kantert