icingaweb2-module-vspheredb icon indicating copy to clipboard operation
icingaweb2-module-vspheredb copied to clipboard

Overriding monitoring rules

Open Virsacer opened this issue 1 year ago • 12 comments

Expected Behavior

When having "Global Monitoring Rules" and "Folder Monitoring Rules" with overlapping settings, the latter should "win".

Current Behavior

I have set "When powered off" to "Do nothing" for VMs globally. But for one folder I have set it to "Trigger a Critical state". Unfortunately the global setting is used for VMs in that folder:

  [OK] Power State
   \_ [OK] Virtual Machine has been powered off

Your Environment

  • VMware vCenter®/ESXi™-Version: 7.0.3
  • Version/GIT-Hash of this module: 1.7.1
  • Icinga Web 2 version: 2.11.4
  • Operating System and version: Oracle Linux 7
  • Webserver, PHP versions: PHP 7.3.33

Virsacer avatar Jun 09 '23 10:06 Virsacer

Please give a look to --inspect, does it reflect what you're seeing?

Thomas-Gelf avatar Jun 09 '23 13:06 Thomas-Gelf

Hi, I did not know that parameter...

  [OK] Power State (--rule ObjectStatePolicy/PowerState)
   trigger_on_poweredOff = "ignore"
   trigger_on_suspended = "critical" (inherited from AlwaysOnFolder)
   trigger_on_unknown = "critical" (inherited from AlwaysOnFolder)
   \_ [OK] Virtual Machine has been powered off

So "trigger_on_poweredOff" comes from Global, but should be overwritten from "AlwaysOnFolder"

Virsacer avatar Jun 09 '23 13:06 Virsacer

mysql --binary-as-hex vspheredb -e 'SELECT * FROM monitoring_rule_set\G'

mysql --binary-as-hex vspheredb -e "SELECT * FROM object WHERE object_name = 'AlwaysOnFolder'\G"

Thomas-Gelf avatar Jun 09 '23 13:06 Thomas-Gelf

*************************** 1. row ***************************
  object_uuid: 0x
object_folder: vm
     settings: {"ObjectStatePolicy/PowerState/critical_for_uptime_greater_than_days":999,"ObjectStatePolicy/PowerState/trigger_on_poweredOff":"ignore","ObjectStatePolicy/PowerState/warning_for_uptime_greater_than_days":999}
*************************** 2. row ***************************
  object_uuid: 0x499A6581CE425D67B70D22D33CE5DEC1
object_folder: vm
     settings: {"ObjectStatePolicy/PowerState/trigger_on_poweredOff":"critical","ObjectStatePolicy/PowerState/trigger_on_suspended":"critical","ObjectStatePolicy/PowerState/trigger_on_unknown":"critical"}



+------------------------------------+------------------------------------+---------------+----------------+-------------+----------------+-------+------------------------------------+------+
| uuid                               | vcenter_uuid                       | moref         | object_name    | object_type | overall_status | level | parent_uuid                        | tags |
+------------------------------------+------------------------------------+---------------+----------------+-------------+----------------+-------+------------------------------------+------+
| 0x499A6581CE425D67B70D22D33CE5DEC1 | 0x0BD3C813BF9240FF8EE78E6E26FB44D3 | group-v140507 | AlwaysOnFolder | Folder      | gray           |     5 | 0xE3488B78CF4759CA96C2EBC4EEE5C6BD | []   |
+------------------------------------+------------------------------------+---------------+----------------+-------------+----------------+-------+------------------------------------+------+

Virsacer avatar Jun 09 '23 13:06 Virsacer

Looks good to me... strange

Thomas-Gelf avatar Jun 09 '23 13:06 Thomas-Gelf

There used to be related bugs, but as you're running v1.7.1 - they should all have been fixed. Could you please try to set it to "Trigger a warning" on some other folder between AlwaysOnFolder and your root? Does that change anything?

Thomas-Gelf avatar Jun 09 '23 13:06 Thomas-Gelf

  [WARNING] Power State (--rule ObjectStatePolicy/PowerState)
   critical_for_uptime_greater_than_days = 999
   trigger_on_poweredOff = "warning" (inherited from Kunden)
   trigger_on_suspended = "critical" (inherited from AlwaysOnFolder)
   trigger_on_unknown = "critical" (inherited from AlwaysOnFolder)
   warning_for_uptime_greater_than_days = 999
   warning_for_uptime_less_than = 900
   \_ [WARNING] Virtual Machine has been powered off

Virsacer avatar Jun 09 '23 13:06 Virsacer

Look like it only happens when a setting is on both root and leaf. When it is on root and in the middle or only at the leaf it works...

Virsacer avatar Jun 09 '23 13:06 Virsacer

If you remove it on the leaf, and set it one level above - does it then work?

Thomas-Gelf avatar Jun 09 '23 14:06 Thomas-Gelf

I did some more tests and always set the same value for all three parameters:

When set only on the leaf, it works fine. As soon as it is set on ANY other level(s), the leaf is ignored. When it is set on multiple non-leaf levels, the lower (shortest to leaf) levels value wins (but never the leaf itself)

Virsacer avatar Jun 09 '23 15:06 Virsacer

Hi there, adding to this, as I'm perceiving the same issues with the same behaviour (also running v1.7.1); when configuring "Enabled" to "Please choose" for any setting, the object is monitored as though the setting was enabled. Also, once monitoring thresholds are set on a branch, they are used before the thresholds of a leaf, even when the monitoring is set to "Pleases choose" on a branch or leaf.

Leaf: image

Next closest branch: image

Host in the leaf group:

[WARNING] Host System, according configured rules
   [OK] Object State Policy (--rule ObjectStatePolicy/*)
      [OK] Overall VMware Object State (--rule ObjectStatePolicy/VMwareObjectState)
       trigger_on_gray = "warning"
       trigger_on_red = "warning"
       trigger_on_yellow = "warning"
       \_ [OK] Overall VMware status is 'green'
      [OK] Power State (--rule ObjectStatePolicy/PowerState)
       critical_for_uptime_greater_than_days = 600
       critical_for_uptime_less_than = 0
       trigger_on_poweredOff = "ignore"
       trigger_on_suspended = "warning"
       trigger_on_unknown = "unknown"
       warning_for_uptime_greater_than_days = 365
       warning_for_uptime_less_than = 0
       \_ [OK] Host System is powered on
       \_ [OK] System booted 298d 2h ago
   [WARNING] Compute Resource Usage (--rule ComputeResourceUsage/*)
      [OK] CPU Usage (--rule ComputeResourceUsage/CpuUsage)
       critical_if_less_than_percent_free = 10 (inherited from [leaf])
       warning_if_less_than_percent_free = 30 (inherited from [leaf])
       \_ [OK] 3.13 GHz out of 57.5 GHz used, 54.3 GHz (94.54%) free
      [WARNING] Memory Usage (--rule ComputeResourceUsage/MemoryUsage)
       critical_if_less_than_percent_free = 2 (inherited from [leaf])
       threshold_precedence = "best_wins"
       warning_if_less_than_percent_free = 20 (inherited from [closest branch])
       \_ [WARNING] 79.32 GiB out of 511.70 GiB (15.50%) free

Additionally, the --inspect doesn't mention when a setting is inherited from the global setting "All vCenters". I find that somewhat unintuitive, since all other inheritances are shown.

Nayakum avatar Oct 24 '23 10:10 Nayakum

We just stumbled over this issue while trying to set individual limits for one datastore. The behaviour is exactly as it is mentioned some comments above: if there is a setting on the path to the leaf, the settings directly at the leaf are ignored. Any idea whether there will be a fix in the nearer future?

edpstiffel avatar Feb 16 '24 14:02 edpstiffel