Limit which "pfs" when querying mellanox cards
https://docs.nvidia.com/dgx/dgx-os-7-user-guide/known_issues.html#id16
With this note:
Accessing PF0 and PF1 is restricted. Currently, there is no temporary solution available.
[11176.517416] mlx5_core 0000:05:00.0: mlx5_cmd_out_err:835:(pid 18360): ACCESS_REG(0x805) op_mod(0x1) failed, status bad operation(0x2), syndrome (0x9a6171), err(-22)
[11176.534892] mlx5_core 0000:05:00.0: mlx5_cmd_out_err:835:(pid 18360): ACCESS_REG(0x805) op_mod(0x1) failed, status bad operation(0x2), syndrome (0x9a6171), err(-22)
[11176.589308] mlx5_core 0000:05:00.1: mlx5_cmd_out_err:835:(pid 10354): ACCESS_REG(0x805) op_mod(0x1) failed, status bad operation(0x2), syndrome (0x9a6171), err(-22)
[11176.607052] mlx5_core 0000:05:00.1: mlx5_cmd_out_err:835:(pid 10354): ACCESS_REG(0x805) op_mod(0x1) failed, status bad operation(0x2), syndrome (0x9a6171), err(-22)
I didn't find a github issue about this in here. Maybe there could be a way to limit which PFs node_exporter tries to get data about?
I have currently selected to disable ethtool metric collection, and logs are no longer being printed. I hope there will be a better solution.
To get rid of the kernel messages I needed to disable the infiniband collector with --no-collector.infiniband
We aren't using the collector.ethtool and disabling all the other net* collectors did not get rid of the messages.