windows_exporter icon indicating copy to clipboard operation
windows_exporter copied to clipboard

New disks not detected until service restart

Open m49808 opened this issue 2 years ago • 1 comments

When adding a new disk to a Windows server (a completely online operation) via a RAID controller or VM add, the new drive is not picked up by the windows_exporter until a service restart. Once it is restarted, the new drive is picked up right away and monitored. This can lead to a dangerous situation where adding a drive could be missing monitoring for months until the next patch cycle / reboot.

It would be better if there were some interim polling cycle (hourly?) to pick up config changes like this. Otherwise we're having to script some restart of the service, or remember to restart it manually (neither ideal) to account for this.

I'm not sure if there are other situations where this occurs, but i've definitely seen it on drive adds.

m49808 avatar Aug 02 '22 12:08 m49808

This needs confirmation, but I suspect this is due to the Perflib countersets only being populated once during the collector init() via the registerCollector function.

If so this may be difficult to resolve as none of the collectors retain any state between scrapes. The exporter's scrape process is dictated by the client (typically Prometheus), and isn't designed to run between the scrapes, and only minimal state is kept by the exporter. Adding timing functionality to run here independent of the main scrape process could be done, but would be tricky.

In my mind there are several hypothetical solutions:

  1. Leave code as-is and document the need to manually restart the exporter when adding disks.
  2. Call addPerfcounterDependecies on each scrape of the logical_disk collector, updating the Perflib countersets to be scraped.
  3. Introduce a new goroutine for identifying countersets in need of updating and triggering an update on the next collector scrape.

I'd personally rule 3) out immediately due to the required complexity, and introduction of state between scrapes. 2) could be done but may have performance concerns.

breed808 avatar Aug 20 '22 11:08 breed808

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

github-actions[bot] avatar Nov 24 '23 16:11 github-actions[bot]