node_exporter Add hw_counters metrics for infiniband device.

Add hw_counters metrics for infiniband device.

ref: https://github.com/prometheus/procfs/pull/549

Oct 17 '23 09:10 dongjiang1989

Beside these, maybe consider combining metrics that can be summed up (like multiple types of errors etc) into one with different labels. See https://prometheus.io/docs/practices/naming/

Oct 19 '23 09:10 discordianfish

Hi, how is progress on this? We would love to have these counters :)

Jan 29 '24 22:01 achow

thanks @discordianfish. Please re-check

Feb 02 '24 09:02 dongjiang1989

This would be really useful for us, any update?

Feb 20 '24 16:02 jgrund

@SuperQ Please recheck it, thanks.

Feb 21 '24 16:02 dongjiang1989

Interested in this PR too.

Feb 23 '24 14:02 LouisDDN

Ping again

Mar 14 '24 15:03 jgrund

The general idea also applies to the other metrics. We need to carefully think about the names since changing them in the future is a breaking change that will need to be enabled by adding a flag etc etc. So sorry if that feels pedentic but we need to make sure we get these names right.

Also, see my other comment here: https://github.com/prometheus/node_exporter/pull/2827#issuecomment-1770448902

PS: Sorry for not being responsive. I don't get paid to work on this so my availability is limited :-/

Mar 21 '24 11:03 discordianfish

The general idea also applies to the other metrics. We need to carefully think about the names since changing them in the future is a breaking change that will need to be enabled by adding a flag etc etc. So sorry if that feels pedentic but we need to make sure we get these names right.

Also, see my other comment here: #2827 (comment)

PS: Sorry for not being responsive. I don't get paid to work on this so my availability is limited :-/

Thank you for your review. I will fix these issues as soon as possible. 🙏

Mar 21 '24 16:03 dongjiang1989

@discordianfish Please re-check, thanks

https://github.com/prometheus/node_exporter/pull/2827/commits/8b5605040ae8f57566211fd7e557dc12363f5965

Mar 22 '24 01:03 dongjiang1989

Better, thanks! Still, can you consider combining some of these metrics using the same name and different labels?

Mar 22 '24 11:03 discordianfish

Better, thanks! Still, can you consider combining some of these metrics using the same name and different labels?

Thanks for your review. Do you have any suggestions to combine some of these indicators using the same name and different labels? @discordianfish

Mar 26 '24 09:03 dongjiang1989

@dongjiang1989 That depends on the metrics, I don't really have the resources to go through all these and suggest something. See the linked best practices, particular this quote:

As a rule of thumb, either the sum() or the avg() over all dimensions of a given metric should be meaningful (though not necessarily useful). If it is not meaningful, split the data up into multiple metrics. For example, having the capacity of various queues in one metric is good, while mixing the capacity of a queue with the current number of elements in the queue is not.

One candidate might be the error metrics where you could have one and distinguish between them via some sort of error type label etc

Apr 10 '24 10:04 discordianfish

The hw_counters is still important metrics, Hope to continue updating！

Jun 04 '24 11:06 xuchenhui-5

node_exporter node_exporter copied to clipboard

Add hw_counters metrics for infiniband device.

node_exporter
node_exporter copied to clipboard