node-exporter-textfile-collector-scripts
node-exporter-textfile-collector-scripts copied to clipboard
mellanox_hca_temp: suppress errors when virtual functions present
VFs appear as additional infiniband devices, but obviously don't report temperatures. The logs are then flooded with:
Dec 17 00:00:53 penny sh[1151952]: mopen: Operation not supported
Dec 17 00:00:53 penny sh[1151947]: mellanox_hca_temp: Failed to get temperature from InfiniBand HCA 'mlx5_2'!
Dec 17 00:00:53 penny sh[1151953]: mopen: Operation not supported
Dec 17 00:00:53 penny sh[1151947]: mellanox_hca_temp: Failed to get temperature from InfiniBand HCA 'mlx5_3'!
and so on.
The only clue I could find to recognise them as virtual is that node_guid is 0000:0000:0000:0000. I'm not sure if this is supposed to change when setting the mac address on the interfaces.
So far, with the virtual function interfaces unconfigured, the following patch suppresses the errors for me:
--- mellanox_hca_temp.orig 2021-06-27 08:55:33.406292246 +0200
+++ mellanox_hca_temp 2023-12-22 15:18:46.072149247 +0100
@@ -41,6 +41,10 @@
if test ! -d "$dev"; then
continue
fi
+ # node_guid is all zeros for Virtual Functions, which report no temp.
+ if [ "$(cat $dev/node_guid)" = "0000:0000:0000:0000" ]; then
+ continue
+ fi
device="${dev##*/}"
# get temperature