node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

Add the ethtool counters related to RDMA/ROCE

Open gangxie112 opened this issue 1 year ago • 4 comments

Hi,

It seems that some important metrics in ethtool related to the RDMA/ROCE are not supported, such as tx.pause.ctrl.phy,rx.prio5.pause and etc. Those counters are very important in ROCE network and included in physical/priority port counter.

So, we we have any plan to support them?

gangxie112 avatar Sep 29 '24 06:09 gangxie112

Dunno how ethtool retrieves them but if there is a way to retrieve them not requiring privileges we're open to a PR for that

discordianfish avatar Oct 01 '24 15:10 discordianfish

Is tx_pause_ctrl_phy vendor or model specific? The only reference to it I can find is for the Mellanox ConnectX series of NICs which use the mlx5 driver, https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/mellanox/mlx5/counters.html

In addition to the basic set of ethtool counters which are mature and implemented by pretty much every NIC, there are also quite a few vendor-specific ethtool stats / options.

dswarbrick avatar Oct 01 '24 22:10 dswarbrick

Yes, those metrics are proprietary to specific nic vendors. But since some nics are widely used, we should at least consider some other way to support it, such as adding a plugging framework. At this time, users have to develop a agent to gather and push the metrics. This is typical way adopted by many cloud providers as far as I know.

gangxie112 avatar Oct 08 '24 01:10 gangxie112

The textfile collector feature is arguably the "plugin framework" in node_exporter.

Implementing support natively for vendor- / hardware-specific counters is tricky without having access to said hardware for testing. I would suggest either attempting to implement this yourself (assuming that you have access to such hardware, and are a reasonably proficient Go developer), or loan some hardware to a developer who is willing to do the work.

dswarbrick avatar Oct 08 '24 10:10 dswarbrick