icinga-powershell-plugins
icinga-powershell-plugins copied to clipboard
Invoke-IcingaCheckService returns state OK even when a service is stopped
This morning a user called that printing doesn't work. This is never good, that is why we have monitoring, that we know about problems before the users. There were some works on printers last week, so I called our supplier and he told me that most likely a service is not running. I checked and indeed one of the required services was not running. The service is set up to run Automatic, it was in a stopped state, and Invoke-IcingaCheckService happily reported that:
PS C:\Windows\system32> Invoke-IcingaCheckService [OK] Services: 58 Ok In the performance section it is reported correctly 'running_services'=55;; 'stopped_services'=3 When I started the service, the running_services metric went to 56, and the stopped_services went to 2. The remaining 2 is set to Automatic (Trigger Start) so it is OK for them to be not running.
I've tested with another service, that is normally put to Manual Startup Type, changed it to Automatic, but didn't start it, and it was reported correctly: [CRITICAL] Services: 1 Critical ....
What makes Invoke-IcingaCheckService not reporting a stopped service? It is not excluded on the command line. Are there any defaults that make this check OK?
I guess the key to the issue is documented: "... The plugin can be used to check for all services which are configured to run automatically on Windows startup by not defining a specific service during plugin call, In this case the plugin will return 'CRITICAL' for services which are set to Automatic and not running, but only if the service ExitCode is not 0." Can I ask, why don't we treat a non-running service a critical issue, if it terminated with ExitCode 0? In my case, the fiddling with the printing system last week made this service stop, it was depending on another service that was restarted, so it stopped (presumably with ExitCode 0), and since other services are not depending on this one, it was never restarted, nor did I was notified it was stopped. Can I have an argument that disables the behaviour "but only if the service ExitCode is not 0."? Thank you.
Hello,
thank you for the issue. The reason for this behavior is, that there are certain services that vendors use to automatically trigger updates or one-time-tasks. However, mostly these services will then terminate itself, with an exit code not like 0.
That means, that all those services will always report critical and you never have a chance to properly monitor your service environment, because most likely, there will be one or two services that always result in a false positive.
If you shut down the printer service gracefully, it will exit properly and is not detected by the service check, If you kill the spooler service by task manager, the plugin will report a critical there.
Now the issue is, when you add an exclude filter, I have no idea of knowing, if you want the automatic startup types or all services monitored. which would result in another flag, telling the plugin to fetch automated services only instead of all.
Which in my opinion makes the plugin way more complex because of many other different filters.
Any thoughts on that?
Hello, At the moment I don't use an exclude filter. Looking at the code, it would be trivial to implement another parameter:
.PARAMETER ExitZeroIsOK This will tell the plugin to return OK instead of Critical, in case an Automatic service was stopped with Exit Code 0. in the param () definition: [switch]$ExitZeroIsOK = $TRUE, The ExitCode check line would be like this: if(($autoservice.configuration.ExitCode -eq 0) && ExitZeroIsOK) {
That way, I could tell the plugin that Exit zero is not OK for me. Yes, I will need to exclude those services, that regularly change state, but that's how it was in the old days also. I'd avoid giving a list of services to check, because if new services are introduced, I might forget adding them to the check list.
The problem is, once you add this flag and there are services appearing you don't want, you want to exclude them. You have to specify them with the filter.
Once the filter is set, the plugin will assume that you are looking for all services, which would require another flag to change that behavior as well.
I know how it worked on the old days, but a bad implementation is in my oppinion no excuse to continueing with a bad implementation.
While I understand the problem in this case. I just want to make sure we are on the same page on this, but this change will result in 3 additional flags being required for the plugin to work properly - increasing in my oppinion the complexity for users not very familiar with monitoring.
"... Once the filter is set, the plugin will assume that you are looking for all services, which would require another flag to change that behavior as well."
I may be wrong, but I don't see that behavior in the code:
https://github.com/Icinga/icinga-powershell-plugins/blob/master/plugins/Invoke-IcingaCheckService.psm1
At line 93 it checks if I provided a list of services. Normally I don't, so the count will be zero. At line 94 it enumerates the services, and excludes what I supply to be excluded. At line 95 it loops through the services it found. At line 98 it filters out all that are not set to Automatic start. At line 105 it checks if the service is running At line 113 is the code I don't like, which accepts the automatic service to be stopped if it exited with 0 At line 121 it checks the rest, the automatic services that are supposed to be running but not, At line 125 the main if ends. The next part is only interesting if we supply service names to check.
I've just verified, that a normal invoke-icingacheckservice returns [OK] Services: 58 Ok, an invoke-icingacheckservice -Exclude DHCP returns [OK] Services: 57 Ok So supplying an Exclude doesn't change the behavior, it still checks for automatic services.
During this years OSMC I had a talk about Icinga for Windows als made a little deep-dive into the topic:
https://youtu.be/Y_FQjRymPBU?t=1413
After lots of consideration I would not want to change the current behavior of the plugin