smartctl_exporter
smartctl_exporter copied to clipboard
Scrape timeout with many disks
I manage a Linux box with around 20-25 HDDs. Some of these disks are faster to reply to smartctl, others are pretty slow, taking around 1-2 seconds. Setting the scrape interval to 60 seconds and scrape timeout to 40 seconds does not help in avoiding regular scrape timeouts. A previous solution (not specific to Prometheus) I made spawned the smartctl subprocesses in parallel for all the HDDs, and it works perfectly. Would a solution like this be appropriate for this software? Maybe with an option to enable or disable it?
This should be possible here with a goroutine worker pool. We can parallellize the data collection.
It doesn't look like the current collector records any timing information, even in debug mode. Something we can improve as well.
I've been using my forked version (see https://github.com/prometheus-community/smartctl_exporter/pull/204) for three weeks without any problem. It successfully solved the scrape issue. Could you please give some feedback?