windows_exporter
windows_exporter copied to clipboard
windows_exporter service failed to start on reboot
After updates and rebooting the server, the windows_exporter service was not running
The windows_exporter service failed to start due to the following error: The service did not respond to the start or control request in a timely fashion.
When I look at the recovery options of the windows_exporter service they are not as other 'standard' windows services. Looks like none has set reset fail count after:0 and restart service after: 0
exporter:

other examples:

I am not really an expert on the settings of recovery of services, but maybe someone should look at these. Maybe it is better to put this minutes on 3 or 5?
https://docs.microsoft.com/en-us/archive/blogs/jcalev/some-tricks-with-service-restart-logic https://social.microsoft.com/Forums/ro-RO/3db76753-4607-4a20-97a0-790c73e379cc/the-actions-after-system-service-failure?forum=winserver8gen
After updates and rebooting the server, the windows_exporter service was not running
+1, we had to restart service after updates
I think startup type should be Automatic(Delayed start) instead of Automatic
@f1-outsourcing Note that those services have a "Subsequent failures" set to "Take no action", meaning it will simply stop trying if it fails to start twice. The first number, reset after, doesn't matter much when subsequent failures is set to restart. We could possibly set the restart interval to something higher to space restarts out, but before this report, we've never heard of this being a problem. As I've asked in the other issue, any logs that can be found about why it is failing is crucial to solving this, rather than attempting to work around it by changing settings. The exporter really shouldn't need anything else to be running to be able to start, so without any indication what is going wrong, we can't really troubleshoot it.
@carlpett I've also noticed this behavior. I'm able to reproduce this consistently by rebooting one of the servers I manage. I'd fully expect to see logs in the "Application" event queue from the source "windows_exporter" when the service fails to start, but I don't. All I see is the same thing reported by @f1-outsourcing. Events are created for the service failing to start due to a timeout.
It's also worth noting that I've seen this issue on pretty much all 200~ windows machines we have.
See the following screenshots:
The service fails to start due to timeout:

The service manager fails the service:

The application event queue has no windows_exporter entries in this time period:

Should I circumvent event viewer? I know stdout is a logger option but I didn't see an option to log to a flat file. If you've got some ideas for troubleshooting this I'd be willing to run whatever is needed. This issue has been quite troublesome for us during patching.
I do have the same issue on Windows Server 2016. EventViewer Warning: " Collection timed out, still waiting for [cs os service] " windows_exporter (version=0.13.0, branch=master, revision=c62fe4477fb5072e569abb44144b77f1c6154016)
Same issue here.
Same issue on Server 2019. Fresh installed machines running the windows_exporter agent do not start the agent on reboot. Playing with the automatic restart options did not resolve the issue.
I think startup type should be Automatic(Delayed start) instead of Automatic
I agree. and at the very least you should have a the First and Second failure set to Restart the Service with a delay of 1min
same issue on windows 8.1
I've still been unable to reproduce this, unfortunately, so anything you can find about why it is happening on your systems, but not all, would be useful. The only thing that can fail during startup in the exporter code is really where we bind to the network interface, so potentially if the network hasn't come up yet. That'd lead to the exporter exiting though, not a timeout...
@babunatarajan You seem to have a completely different issue, since your error is a timeout during metric collection from a running exporter.
@carlpett if i see this correctly, it works with Delayed start, so i my best guess is that the windows_exporter service starts and immediately exits again during its first try, probably because a dependency is not fulfilled at that early stage of boot time. maybe the network, i dont know. however the service after installation shows no dependency, neither restart options are set, so one fail during start and it stays off, which is not good... my suggestion: Installer change to make the service Automatic (Delayed) and set 1 day clear, 5 minutes each retry as restart options. then this will work.
I already set the Delayed Start as soon as it failed to start at the boot, but never really tested just because it is prod environment. Did someone set the Delayed Start and rebooted the server? if it works we can keep this as a workaround.
Thanks
I set my servers to delayed start and it seemed to at least start correctly when Windows started up. I'm unsure if it would restart on failure correctly or not though.
There's a lot of different threads flying here, and a few misconceptions.
First off, regarding restarts. We already configure service to restart on failure, and delay the restarts by five seconds. This is visible via sc qfailure windows_exporter, but the Services UI appears to only work with minutes, so it shows zero (it would probably make sense to bump this to 60 seconds to reduce confusion)
Then, on the topic of Delayed starts. I'm not in principle against it (it will mean you will not have metrics for ~2 minutes longer than otherwise after a reboot, but that is probably not a huge deal in most cases), but there seems to be a mixed bag of experiences reported on whether it helps or not. I've now tried booting completely without networking and related services enabled, and it does not appear to prevent the windows_exporter from starting. So there's something deeper going on. Are any of you overriding the service account for the service, so you could have a dependency on Active Directory being available?
The 2019 machines I was seeing the problem on are AD joined and hardened with the CIS guidelines. I never had issues last year when I was still using Windows Server 2012 R2 and an older version of the exporter with the service starting correctly on reboot so maybe it's a 2019 Server issue?
Hi everyone,
I was able to get through this issue by running this command :
Delayed start
sc config windows_exporter start= delayed-auto
Restart option
sc failure windows_exporter actions= restart/60000/restart/60000/""/60000 reset= 86400
Tested on Windows Server 2012 R2 / 2016 / 2019.
Hope its help.
I have the same problem on freshly provisioned Azure Windows VMs: windows_exporter fails to start after VM reboot.
enabled_collectors: "cpu,cs,iis,logical_disk,memory,net,os,service,system"
solved for me with a folder exclusion rule on Windows Defender use of windows_expoter v0.13 Problem appears with August Windows update on Windows 2016 servers
Same issue here.
The windows_exporter service failed to start due to the following error: The service did not respond to the start or control request in a timely fashion.
A timeout was reached (180000 milliseconds) while waiting for the windows_exporter service to connect.
I can confirm setting the service to Delayed Start fixed the issue. Why can't this be set to Delayed Start by default?
@josephB Good call on the exclusion, in our case looks like our AV tools needed an exception following aug updates.
@chinhodado As I mention in my comment above, it doesn't seem to solve it very reliably. If we could figure out why it fixed it for you, that'd be a big step forward towards making a change. If it is related to antivirus starting up, as indicated by some other commenters lately, we'd be much better served by setting the correct service dependency.
Ill see if i can get more detail.
Setting delayed start doesn't help. Until it's fixed, I'm using a scheduled task which starts windows_exporter if it's not running every 5 mins.
@dry4ng It'd be interesting to see if your case is solved with an exception in Windows Defender as mentioned above?
In my case, almost all my Windows Server 2016/2019 machines will start the service with the automatic delayed startup after a reboot. I seem to always have a few that do not and I have to go manually start them once I get alerted. I can confirm that I've removed the Windows Defender feature from my Windows 2019 servers because I am using a third-party AV software. I was also thinking of having some kind of work around to start up the service when it is stopped but had been hesitant to put one in place so far.
Is there any log that we can look at to debug why the service doesn't start? AFAIK the service doesn't generate any log file.
I installed 0.15 yesterday because I noticed added a dependency for the Windows service on the WMI service. I experienced the same problem where the service would not start with 0.15 when the start up type is set to Automatic. When I changed the start up type to Automatic (Delayed Start) after upgrading to 0.15 the service did start correctly after a reboot.
I noticed looking in the event viewer that the windows_exporter service did start but had problems collecting metrics, and I guess stopped itself, before the event that says the "Windows Management Instrumentation" service was started. Maybe this is the service that should be the dependency instead of or in addition to "WMI Performance Adapter"?
I decided to test my theory about changing the service dependency to the "Windows Management Instrumentation" service. I changed the service start up type back to automatic from delayed start and then changed the dependancy from the "WMI Performance Adapter" to the "Windows Management Instrumentation" service. I then restarted 5 times and verified that the windows_exporter service was started each time.
After that for sanity checking I changed the dependency back to the "WMI Performance Adapter" and then reboot. On that reboot the windows_exporter service however did start correctly. I then decided to see if rebooting again would have the same result and it did. I'm therefore not sure if chaning the dependancy is going to solve this problem or not. I would think though that depending on the WMI service directly would probably be a better idea as the performance adapter service on my system is set to manual start and I observed it was not starting up when I removed the windoes_exporter dependancy on it so this dependancy is starting an additional service that was not previouslly running on my system.
I was testing on a Windows 2019 machine. Here are the commands I ran to change the service back to auto and then change the dependency to the WMI service itself instead of the performance adapter. Maybe someone else could do further testing to see if they are able to reproduce the error. If I had to take a random guess here, I think the problem would be more likely to occur on systems where it takes longer to start up the services on boot. My system is pretty quick to reboot and it only sometimes fails to start the windows_exporter service, usually after a Windows update is installed for example it fails.
sc.exe config windows_exporter start= auto
sc.exe config windows_exporter depend= Winmgmt
I can confirm that my company also experiences same issue with windows_exporter 0.15.0 on Windows server 2016. The last stable release which did not cause this was wmi_exporter 0.9.0. The trend what I have noticed is that exporter fails to start only after windows updates, if you perform normal reboot it works just fine.
@bpickhardt I am going to test you proposal about Windows Management Instrumentation decency on our prod servers. We do not perform windows updates on all machines at the same time so I can provide my findings this week.
We are starting to do more extensive testing of windows_exporter 0.15.0 and are noticing similar trends as mentioned above. We have a pool of 9 test servers ranging from 2008R2 - 2019, including 2019 Core. The problem is that there doesn't seem to be any indicator of the service stopping that I can find in the event logs which leads me to believe it isn't always starting after a reboot.
I'm investigating one 2016 server now. Here is the last time it shows in the application log as started: 69489 Jan 28 16:59 Information windows_exporter 100 Starting server on :9182
Uptime on server is approximately 6 days, 9 hours, which means it rebooted on 2/10. Get-Hotfix shows a software update applied on 2/10, so than lines up with prior data indicating that Software Update/Reboot causes the service not to start. I'm trying to find further evidence of this on other OS versions as about 2/3 of our exporters in our test group aren't running at this current time. The only events showing on the system are in the Application log. I couldn't find anything in System log.
I should also note that the service seems to start fine if I manually start it or reboot the server without any updates in progress or being applied to the system.
Is there any setting in exporter to turn on debugging so it logs more in the event log?
Regarding the service not starting, I was able to correlate errors with starting with patch times, so there is clearly an issue with the service after a Windows update. Sorry, this is truncated because of powershell, but this looks similar to what others have reported.
308084 Feb 10 00:09 Error Service Control M... 3221232472 The windows_exporter service failed to start due to the following error: ... 308083 Feb 10 00:09 Error Service Control M... 3221232481 A timeout was reached (30000 milliseconds) while waiting for the windows_exporter service to connect.