node-exporter-textfile-collector-scripts icon indicating copy to clipboard operation
node-exporter-textfile-collector-scripts copied to clipboard

smartmon.sh: allow collecting of additional attributes

Open ZigZagT opened this issue 3 years ago • 10 comments

Allowing additional smart attributes to be collect by providing an environment variable SMARTMONATTRS

For example

export SMARTMONATTRS="$(cat <<'MOREATTRS'
  multi_zone_error_rate
  percent_lifetime_remain
  power_off_retract_count
  reallocate_nand_blk_cnt
  total_host_sector_write
MOREATTRS
)"

./smartmon.sh
# the output will now include the `percent_lifetime_remain`, `power_off_retract_count`, etc, attributes

example output:

...
# HELP smartmon_power_off_retract_count_raw_value SMART metric power_off_retract_count_raw_value
# TYPE smartmon_power_off_retract_count_raw_value gauge
smartmon_power_off_retract_count_raw_value{disk="/dev/sda",type="sat",smart_id="192"} 9.000000e+00
# HELP smartmon_power_off_retract_count_threshold SMART metric power_off_retract_count_threshold
# TYPE smartmon_power_off_retract_count_threshold gauge
smartmon_power_off_retract_count_threshold{disk="/dev/sda",type="sat",smart_id="192"} 0
# HELP smartmon_power_off_retract_count_value SMART metric power_off_retract_count_value
# TYPE smartmon_power_off_retract_count_value gauge
smartmon_power_off_retract_count_value{disk="/dev/sda",type="sat",smart_id="192"} 100
# HELP smartmon_power_off_retract_count_worst SMART metric power_off_retract_count_worst
# TYPE smartmon_power_off_retract_count_worst gauge
smartmon_power_off_retract_count_worst{disk="/dev/sda",type="sat",smart_id="192"} 100
...

ZigZagT avatar Oct 11 '21 21:10 ZigZagT

@dswarbrick @SuperQ mind taking a look at this one?

ZigZagT avatar Oct 23 '21 04:10 ZigZagT

Nice idea, what if we flip this around, make the whole list setup as a "default" value.


default_smartmon_attrs="$(
  cat <<'SMARTMONATTRS'
...
SMARTMONATTRS
)"

smartmon_attrs="${SMARTMON_ATTRS:-${default_smartmon_attrs}}"

This way the user can provide their own list over the default.

SuperQ avatar Oct 23 '21 07:10 SuperQ

@SuperQ I actually thought about it while working on this branch. Didn't go with it, here's why:

  1. I anticipate the primary use case of the feature would be "adding" metrics (i.e. nvme specific metrics, vendor specific metrics) rather than removing default ones. Most of the default metrics are very essential (i.e. smartmon_device_info) that I can't imagine any reasonable use case that can go without them.
  2. There's some (perf, storage) overhead for collecting additional attributes, but I personally think it's negligible considering the extraordinary efficiency provided by Prometheus ( that's one reason why we all love Prometheus :P ). A couple of attributes a user collects but never uses wouldn't hurt anything.

ZigZagT avatar Oct 24 '21 00:10 ZigZagT

I wonder, is it worth maintaining both a shell script and a Python version of this collector? If so, why?

IMHO, the Python version would be considerably easier to enhance, especially if it was refactored to parse the JSON output of smartctl.

dswarbrick avatar Oct 24 '21 10:10 dswarbrick

@dswarbrick Originally I thought it would be nice to have examples for both Bash and Python. Some people may not have Python installed. But, maybe you're right, it's time to deprecate the Bash version.

SuperQ avatar Oct 24 '21 10:10 SuperQ

@ZigZagT One reason why I personally don't like to give users extensible options for some of the official Prometheus exporters is it reduces how much work is contributed upstream.

A good example of this behavior is the postgres_exporter vs the mysqld_exporter. The mysqld_exporter is not really user modifiable unless you write Go. It has a huge number of collectors, control flags, good defaults, etc.

The postgres_exporter by comparison has almost no good collector defaults because everyone just uses the queries.yml file to add more metrics. So this becomes a bad source of copy-pasta and silos of knowledge about monitoring.

SuperQ avatar Oct 24 '21 10:10 SuperQ

@SuperQ @dswarbrick I don't know how much you know about the python script, but in short, the python script is not compatible with the bash version at all. They produce completely different metrics. I would have to rewrite the entire script to make it compatible -- which I feel is too much work compared with the simple fix in the bash script.

@SuperQ

One reason why I personally don't like to give users extensible options for some of the official Prometheus exporters is it reduces how much work is contributed upstream.

I'm not sure what you mean. If you're saying adding extensibility to scripts in this repo is a no-goal then feel free to close the PR. If you're saying SMART info should be moved into a dedicated exporter then I'd argue it should be part of the node_exporter itself.

ZigZagT avatar Oct 25 '21 00:10 ZigZagT

Also, since the bash version works perfectly fine, and the code is perfectly maintainable, I really don't see any reason to spend time on the python script

ZigZagT avatar Oct 25 '21 00:10 ZigZagT

So do we have a decision on this PR? @SuperQ @dswarbrick

ZigZagT avatar Jan 13 '22 23:01 ZigZagT

any updates so far?

ZigZagT avatar Jan 24 '22 18:01 ZigZagT

closing due to no response from the maintainer.

ZigZagT avatar Oct 30 '22 00:10 ZigZagT