windows_exporter
windows_exporter copied to clipboard
windows_exporter v0.14 don't auto start
Windows_exporter v0.14 don't auto start service when restart server. I must start service manual.
I'm experiencing the same problem. I didn't use any version before 0.14 so I don't know whether this version introduced the problem.
A timeout was reached (30000 milliseconds) while waiting for the "windows_exporter" service to connect.
followed by
The "windows_exporter" service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.
Unfortunately, the Windows exporter does not log anything at that time, so there is not much to work with. Starting it manually afterwards has always succeeded. This happens consistently on Windows 2012R2 (scheduled restarts over night) while 2016 and 2019 seem to work fine. I will try this on a clean 2012R2.
Might this be a missing dependency?
@hieu15 Can you verify if you see the same error messages in the event viewer? Otherwise this might be a different problem.
I can confirm that this happens on a fresh 2012R2
- Download ISO from https://www.microsoft.com/en-us/evalcenter/evaluate-windows-server-2012-r2
- Fresh install on your favorite hypervisor
- Install Windows exporter
- The exporter runs fine at this point
- Restart Windows
- The exporter is in the "starting" phase for about 30 seconds before the error messages appear in the event viewer
I've tried changing the start up type to "automatic delayed". The exporter starts fine in that case.
I've tested on a fresh 2016 and it doesn't show any problems, so I've tested on 2012R2 a bit more. This is where things get a bit crazy. Version 0.13 shows the same problems. However, v0.12 works perfectly fine. I've rebooted multiple times, it's always consistent. I did a clean uninstall of the exporter before installing a new one.
According to the release notes the only change in 0.13 is the project name. This seems crazy, but could the name have an effect on the start up order?
@fischerman any relation to https://github.com/prometheus-community/windows_exporter/issues/544 or https://github.com/prometheus-community/windows_exporter/issues/599 maybe? They all seem to be about not starting exporters (after reboot).
@fischerman That's very odd, but thanks for the exhaustive testing!
I just did a check on the full diff between those versions, and there really isn't anything apart from the name changes. There could potentially be some differences coming from the CI build machines updating something (Go compiler could have incremented some patch version, possibly).
Would you be able to run a test with the latest version on master
? #630 bumped the Go version in order to fix a compiler bug which shouldn't be related to this, but...
@carlpett I didn't find an installer of master so I installed from 0.14 and swapped the .exe with https://ci.appveyor.com/project/prometheus-community/windows-exporter/builds/35870173/artifacts
As suspected this is not the cause.
@frittentheke I don't think they are related. But it gave me the idea to test 32 bit. Same problem.
Thanks for checking! Your comment
This seems crazy, but could the name have an effect on the start up order?
... has been bouncing around in the back of my head during the day. And after managing a repro I now have a working theory. Could you test if running sc config windows_exporter depend= wmiApSrv
fixes it for you?
I can't say this cleanly explains every report, but my thinking goes like this:
- The service control manager uses lexicographic sorting as a tie-breaker when choosing start order. With the old name, we'd start slightly later.
- During startup, we make two calls that I think could potentially hang if some dependent service isn't started: https://github.com/prometheus-community/windows_exporter/blob/7eab1fc4118ad31365d06cde6020f5a80a88767e/exporter.go#L316 https://github.com/prometheus/common/blob/cfeb6f9992ffa54aaa4f2170ade4067ee478b250/log/eventlog_formatter.go#L51
Some reasons it could work on other OSes would be that they have something earlier in the startup phase which starts whatever it is we need to be started.
I'm not 100% confident the wmiApSrv is the right dependency, or if it just happens to initialize something as a side effect. It'd had been a much neater explanation if it was also in the Auto start group (it's manual), and if it'd stayed running (on my repro-machine at least it stops quite quickly).
Reason I point to the logging initialization is that I can't seem to get anything to log even if I throw it before initWbem
, so we must get stuck earlier. It seems a bit odd that logging wouldn't work, though.
But would be super interesting to see if this fixes it for you as well!
(As a side note, renaming the service to eg xwindows_exporter also seems to fix it on my VM)
I've added the dependency and did 3 restarts. The service came up fine every time. After adding the dependency, the WMI Performance adapter stays running in my case, possibly indefinitely (also in manual
). However, in the field I see the latter service constantly flapping.
If the initialization of the event logger is the problem can we temporarily write the error message into a file?
Thanks for the test! It's a bit unsatisfying to not really understand why, but since it does seem to fix the problem, I think we should probably add it to the next release. The current logger implementation doesn't have a file logger, but we could probably add one fairly easy.
I did a little bit more testing. I manually wrote stuff in log files. In all the test cases, no log file was written during Windows start up, but were showing up fine when starting the service afterwards. The only exception was a hello world program (built with promu) with which I replaced the windows exporter. That one always produced logs, so filesystem is generally available.
I believe the problem occurs before the main function, hence in one of init functions. I had my eye on the MapCounterToIndex
which calls the perflib library internally (the perflib library may panic) but it appears that the problem is even before that (no call to registerCollectors
occurs). The only init function in the project that is special is cpu.go
, but even after removing the version check, no log is produced. So my best guess is that there is a panic in the init function of one of the dependencies.
I also wrapped the exporter in nssm to get the stderr, but that one starts fine...
@fischerman I had this same issue with windows_exporter v0.13 on Win2012R2. I found two culprits which contributed to the issue.
- Set the windows_exporter service startup to Automatic (Delayed Start).
- Apply all Windows Update patches on Win2012R2 In most cases #1 is all it took, but I also discovered stability issues if #2 had not been done. I was unable to determine which patch was affecting windows_exporter, but since fully patching things have worked fine.
Delayed start is also my current workaround. I'm still hoping we can find the root cause. However, if my suspicion is correct and an init function of a dependency causes the crash it will be hard to discover. One idea I just thought of is to generate a core dump using GOTRACEBACK=crash
.
I also have this issue. Thank you @fischerman for all the testing! I was going to suggest moving the windows service handler stuff to the beginning of main() as I believe it is the control messages from the service that makes Windows understand that the service is there. But, if it really crashes before main(), then it is no use.
One idea I just thought of is to generate a core dump using GOTRACEBACK=crash.
Good idea! I think Go doesn't yet produce minidumps without patching: https://github.com/golang/go/issues/20498
It could perhaps be possible to use GOTRACEBACK=crash (or =system) and in some way get stderr redirected to a file to see the tracebacks. I know there are wrappers like winsw that can do that, but it sounds like quite a hassle.
I experience the issue too but what's interesting is that the service crashed within a second but the log message show that it tried for 30 seconds.
A timeout was reached (30000 milliseconds) while waiting for the windows_exporter service to connect.
The windows_exporter service failed to start due to the following error: The service did not respond to the start or control request in a timely fashion.
Update: I could only get it fixed by restarting the vm.
@basroovers Which version do you use? How do you know that the service crashed after one second?
@fischerman I have hundreds of instances running on v0.15.0.
Most of these didn't autostart after the last Windows update cycle (reboots). One of them couldn't even get the service started and needed the reboot
Here is an export of the Windows event log of that VM:
Level,Date and Time,Source,Event ID,Task Category
Error,17-12-2020 08:19:45,Service Control Manager,7000,None,"The windows_exporter service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion."
Error,17-12-2020 08:19:45,Service Control Manager,7009,None,A timeout was reached (30000 milliseconds) while waiting for the windows_exporter service to connect.
The problem occured yesterday on a 2016 server with 0.14. System has now been configured to use the delayed start.
I also have same issue from time to time using or not "delayed start" – it doesn't matter. And I have an idea about this "time to time". I found that exporter's start failure is more likely to occur in case of reboot after installed updates. Taking into account that process of installing updates continues after restart I think that exporter can not start because some exporter's dependencies are still updating or even not yet started.
Exporter didn't start tonight after automatic reboot in reason of installing updates for Windows Server 2019 Std, but later, after manual reboot it started well.
Updates were installed: February 9, 2021-KB4601887 Cumulative Update for .NET Framework 3.5, 4.7.2 and 4.8 for Windows 10, version 1809 and Windows Server, version 2019 http://support.microsoft.com/kb/4601887 February 9, 2021—KB4601345 (OS Build 17763.1757) https://support.microsoft.com/help/4601345 KB4601393: Servicing stack update for Windows 10, version 1809: February 9, 2021 https://support.microsoft.com/help/4601393
Hope it helps.
We've updated to v0.16.0, but we experienced the same thing again last weekend. Several Windows updates were installed after which the windows_exporter service was not running.
Same messages:
The windows_exporter service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.
A timeout was reached (30000 milliseconds) while waiting for the windows_exporter service to connect.
Perhaps good to check if this is a VMware specific issue? We are running VMware.
I updated to 0.16.0 right before the April update cycle in our test environment. I also modified the service to Automatic (Delayed Start). Results were mixed. 2008R2 and 2019 did not start windows_exporter after the automatic patch and restart 2012R2 and 2016 servers started windows_exporter successfully after the automatic patch and restart
Tim
On Mon, Apr 19, 2021 at 8:31 AM Bas Roovers @.***> wrote:
We've updated to v0.16.0, but we experienced the same thing again last weekend. Several Windows updates were installed after which the windows_exporter service was not running.
Same messages:
The windows_exporter service failed to start due to the following error: The service did not respond to the start or control request in a timely fashion.
A timeout was reached (30000 milliseconds) while waiting for the windows_exporter service to connect.
Perhaps good to check if this is a VMware specific issue? We are running VMware.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/prometheus-community/windows_exporter/issues/637#issuecomment-822468923, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUP2G4ZOWPW424LLKO3P2TTJQWEVANCNFSM4S2R7NTA .
-- Tim Biles Sys/Database Design/Admin 3 | ITSS | itss.d.umn.edu Storage Champion | z.umn.edu/scn University of Minnesota Duluth | www.d.umn.edu @.*** | 218-726-6959
Just to confirm, I ran into this issue with v0.16.0
Exporter failed to start on 2 out of 10 servers after cumulative updates.
- Server2012R2
- Vmware virtual machine
- Service set to Automatic (Delayed Start)
David
My test results were slightly different than last month. 2008R2 consistently has failed, 2012R2 and 2016 consistently succeed, 2019 is intermittent. All mine are VSphere VMs, all set to Automatic (Delayed Start).
3 of 10 machines failed to start windows_exporter (1 2008R2, 2 2019). 7 machines started windows exporter (4 2012R2, 1 2016, 2 2019)
Tim
On Wed, May 12, 2021 at 3:54 AM davideverall @.***> wrote:
Just to confirm, I ran into this issue with v0.16.0
Exporter failed to start on 2 out of 10 servers after cumulative updates.
- Server2012R2
- Vmware virtual machine
- Service set to Automatic (Delayed Start)
David
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prometheus-community/windows_exporter/issues/637#issuecomment-839594662, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUP2GZQ4VGB73J2QGTF4NLTNI63ZANCNFSM4S2R7NTA .
-- Tim Biles Sys/Database Design/Admin 3 | ITSS | itss.d.umn.edu Storage Champion | z.umn.edu/scn University of Minnesota Duluth | www.d.umn.edu @.*** | 218-726-6959
I see this issue frequently, occurring typically when there are alot of Windows updates released and available for install.
After the updates are installed and the server is restarted, if the server is under heavy CPU load whilst the updates are installing, then the Windows_exporter service does not start up.
Usually, it is the Windows Modules Installer Worker(TiWorker.exe) service which is using up high CPU on the server to install the updates. I use mainly Windows Server 2016.
Would it be possible to put some logic in whereby the "Windows_exporter" service is checked every x minutes to check its up and running, and if not, to start it up?
Just to report it, it seems to be happen with v0.16. still, on server 2019...
Get-Package *exporter*
Name Version Source ProviderName
---- ------- ------ ------------
windows_exporter 0.16.0
Get-ComputerInfo | select WindowsProductName, WindowsVersion, OsHardwareAbstractionLayer
WindowsProductName WindowsVersion OsHardwareAbstractionLayer
------------------ -------------- --------------------------
Windows Server 2019 Standard 1809 10.0.17763.2686
TimeCreated Id LevelDisplayName Message
----------- -- ---------------- -------
13/04/2022 09:02:53 7036 Information The windows_exporter service entered the running state.
13/04/2022 03:35:04 7000 Error The windows_exporter service failed to start due to the following error: ...
13/04/2022 03:35:04 7009 Error A timeout was reached (30000 milliseconds) while waiting for the windows_exporter service to connect.
12/04/2022 13:23:43 7036 Information The windows_exporter service entered the running state.
12/04/2022 06:11:32 7036 Information The windows_exporter service entered the stopped state.
15/03/2022 11:37:37 7036 Information The windows_exporter service entered the running state.
09/03/2022 03:32:47 7000 Error The windows_exporter service failed to start due to the following error: ...
09/03/2022 03:32:47 7009 Error A timeout was reached (30000 milliseconds) while waiting for the windows_exporter service to connect.
09/03/2022 03:30:42 7036 Information The windows_exporter service entered the stopped state.
Event at 12/04/2022 13:23:43
was a system start, event at 13/04/2022 09:02:53
is me starting the service. Apparently there were at least one earlier occurance too.
Also it is not isolated, at least one others server shows this behaviour (that one is the same OS version):
TimeCreated Id LevelDisplayName Message
----------- -- ---------------- -------
17/03/2022 10:28:10 7036 Information The windows_exporter service entered the running state.
17/03/2022 00:32:36 7000 Error The windows_exporter service failed to start due to the following error: ...
17/03/2022 00:32:36 7009 Error A timeout was reached (30000 milliseconds) while waiting for the windows_exporter service to connect.
17/03/2022 00:31:40 7036 Information The windows_exporter service entered the stopped state.
I realise we do not run the latest release, so if this issue supposed to be fixed by that, can you confirm - maybe point me to the right direction - so we can consider upgrade?
Hello,
Same issue for me on Windows server 2019:
-> A timeout was reached (30000 milliseconds) while waiting for the windows_exporter service to connect.
-> The windows_exporter service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.
Exporter: 0.18.1
time="2022-08-18T08:08:59Z" level=warning msg="No where-clause specified for service collector. This will generate a very large number of metrics!" source="service.go:48"
time="2022-08-18T08:08:59Z" level=info msg="Enabled collectors: net, os, service, system, textfile, cpu, cs, logical_disk" source="exporter.go:348"
time="2022-08-18T08:08:59Z" level=info msg="Starting windows_exporter ( version=0.18.1, branch=heads/tags/v0.18.1, revision=e07b2053af3ca708db3489f41e2fcde9941860f4)" source="exporter.go:400"
time="2022-08-18T08:08:59Z" level=info msg="Build context (go=go1.17.6, user=runneradmin@xxxxx date=20220205-08:18:35)" source="exporter.go:401"
time="2022-08-18T08:08:59Z" level=info msg="Starting server on :9182" source="exporter.go:404"
time="2022-08-18T08:08:59Z" level=info msg="TLS is disabled." source="gokit_adapter.go:38"
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.
This should be resolved as of #1047. Let me know if it's still an issue.