smartctl_exporter icon indicating copy to clipboard operation
smartctl_exporter copied to clipboard

Scrape timeout with many disks

Open pokab opened this issue 1 year ago • 2 comments

I manage a Linux box with around 20-25 HDDs. Some of these disks are faster to reply to smartctl, others are pretty slow, taking around 1-2 seconds. Setting the scrape interval to 60 seconds and scrape timeout to 40 seconds does not help in avoiding regular scrape timeouts. A previous solution (not specific to Prometheus) I made spawned the smartctl subprocesses in parallel for all the HDDs, and it works perfectly. Would a solution like this be appropriate for this software? Maybe with an option to enable or disable it?

pokab avatar Jan 20 '24 17:01 pokab

This should be possible here with a goroutine worker pool. We can parallellize the data collection.

It doesn't look like the current collector records any timing information, even in debug mode. Something we can improve as well.

SuperQ avatar Jan 22 '24 16:01 SuperQ

I've been using my forked version (see https://github.com/prometheus-community/smartctl_exporter/pull/204) for three weeks without any problem. It successfully solved the scrape issue. Could you please give some feedback?

pokab avatar Mar 19 '24 14:03 pokab