smartctl_exporter Scrape timeout with many disks

Scrape timeout with many disks

Open pokab opened this issue 1 year ago • 2 comments

I manage a Linux box with around 20-25 HDDs. Some of these disks are faster to reply to smartctl, others are pretty slow, taking around 1-2 seconds. Setting the scrape interval to 60 seconds and scrape timeout to 40 seconds does not help in avoiding regular scrape timeouts. A previous solution (not specific to Prometheus) I made spawned the smartctl subprocesses in parallel for all the HDDs, and it works perfectly. Would a solution like this be appropriate for this software? Maybe with an option to enable or disable it?

Jan 20 '24 17:01 pokab

This should be possible here with a goroutine worker pool. We can parallellize the data collection.

It doesn't look like the current collector records any timing information, even in debug mode. Something we can improve as well.

Jan 22 '24 16:01 SuperQ

I've been using my forked version (see https://github.com/prometheus-community/smartctl_exporter/pull/204) for three weeks without any problem. It successfully solved the scrape issue. Could you please give some feedback?

Mar 19 '24 14:03 pokab

smartctl_exporter smartctl_exporter copied to clipboard

Scrape timeout with many disks

smartctl_exporter
smartctl_exporter copied to clipboard