k8s-rdma-shared-dev-plugin
k8s-rdma-shared-dev-plugin copied to clipboard
Are you support ConnectX 6 network interface card?Why the resources Capacity and Allocatabel values is 0 at k8s cluster?
Are you support ConnectX 6 network interface card?
Are you support ConnectX 6 network interface card?
Hi @sober-wang . ConnectX 6 is a supported NIC
Are you support ConnectX 6 network interface card?
Hi @sober-wang . ConnectX 6 is a supported NIC
But my k8s resource description Capacity and Allocatable is 0 values.
my os: ubuntu 20.04 kubernetes version: 1.23 my kubelet --root-dir: /data/kubelet
the plugin configuration.
and workload.
root@gpu-11:~# ibdev2netdev
mlx5_0 port 1 ==> ens12f0np0 (Down)
mlx5_1 port 1 ==> ens12f1np1 (Down)
mlx5_2 port 1 ==> ens24np0 (Up)
mlx5_3 port 1 ==> ens25np0 (Up)
mlx5_4 port 1 ==> bondYW (Up)
mlx5_5 port 1 ==> ens17f1np1 (Down)
mlx5_6 port 1 ==> bondYW (Up)
mlx5_7 port 1 ==> ens18f1np1 (Down)
mlx5_8 port 1 ==> ens30np0 (Up)
mlx5_9 port 1 ==> ens31np0 (Up)
root@gpu-11:~# mst status -v
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
PCI devices:
------------
DEVICE_TYPE MST PCI RDMA NET NUMA
ConnectX6(rev:0) /dev/mst/mt4123_pciconf3 df:00.0 mlx5_9 net-ens31np0 1
ConnectX6(rev:0) /dev/mst/mt4123_pciconf2 a0:00.0 mlx5_8 net-ens30np0 1
ConnectX6(rev:0) /dev/mst/mt4123_pciconf1 72:00.0 mlx5_3 net-ens25np0 0
ConnectX6(rev:0) /dev/mst/mt4123_pciconf0 58:00.0 mlx5_2 net-ens24np0 0
ConnectX4LX(rev:0) /dev/mst/mt4117_pciconf2.1 83:00.1 mlx5_7 net-ens18f1np1 1
ConnectX4LX(rev:0) /dev/mst/mt4117_pciconf2 83:00.0 mlx5_6 net-bondYW 1
ConnectX4LX(rev:0) /dev/mst/mt4117_pciconf1.1 82:00.1 mlx5_5 net-ens17f1np1 1
ConnectX4LX(rev:0) /dev/mst/mt4117_pciconf1 82:00.0 mlx5_4 net-bondYW 1
ConnectX4LX(rev:0) /dev/mst/mt4117_pciconf0.1 18:00.1 mlx5_1 net-ens12f1np1 0
ConnectX4LX(rev:0) /dev/mst/mt4117_pciconf0 18:00.0 mlx5_0 net-ens12f0np0 0
@sober-wang , I think this might relate to your use of a custom root-dir for kubelet. If you're using the nvidia/mellanox network-operator, it hardcodes the volume mounts for the pod that runs this service to the standard kubelet root path.
The kubernetes manifest in this repository is guilty of the same.
Not really related, but same idea: https://github.com/kubernetes/kubernetes/issues/120626