ib-sriov-cni
ib-sriov-cni copied to clipboard
mellanox SRIOV demo pod cannot be created
I tried to create a pod with SRIOV net device (e.g. Mellanox IB), but the pod stuck in ContainerCreating. I configured 4 VFs on the IB interface of the host. I run device plugin pod and Multus CNI meta-plugin. but the SRIOV demo pod show ERROR
multus
./multus-daemonset-thick-plugin.yml:125: image: ghcr.io/k8snetworkplumbingwg/multus-cni:v3.9.2-thick-amd64
ERROR
n-MacBookPro:~/20-k8s-rdma-sriov/ib-sriov-cni/deployment/examples$ kubectl describe po my-test-pod-fnjk7
Name: my-test-pod-fnjk7
Namespace: default
Priority: 0
Node: s-113-2-35/10.113.2.35
Start Time: Tue, 22 Nov 2022 20:22:33 +0800
Labels: <none>
Annotations: cni.projectcalico.org/containerID: 848157aeb2b3549aa8e2fce419c8353989ecb98ad62b1c6513f46423492f6cfd
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
k8s.v1.cni.cncf.io/networks: [{"name": "ib-sriov-network"}]
Status: Pending
IP:
IPs: <none>
Containers:
my-test-ctr:
Container ID:
Image: mellanox/rping-test
Image ID:
Port: <none>
Host Port: <none>
Command:
sh
-c
sleep 1000000
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
mellanox.com/mlnx_sriov_rdma_ib: 1
Requests:
mellanox.com/mlnx_sriov_rdma_ib: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2clfq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-2clfq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 21s default-scheduler Successfully assigned default/my-test-pod-fnjk7 to s-113-2-35
Normal AddedInterface 21s multus Add eth0 [10.42.0.21/32] from k8s-pod-network
Warning FailedCreatePodSandBox 21s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "ef4b067661534edfacd217cb1ea3cb1b2cdd44f65ffc1067a59091a2ae6490be" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "ef4b067661534edfacd217cb1ea3cb1b2cdd44f65ffc1067a59091a2ae6490be" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name ef4b067661534edfacd217cb1ea3cb1b2cdd44f65ffc1067a59091a2ae6490be-net1]
Normal AddedInterface 20s multus Add eth0 [10.42.0.22/32] from k8s-pod-network
Normal AddedInterface 19s multus Add eth0 [10.42.0.23/32] from k8s-pod-network
Warning FailedCreatePodSandBox 19s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "3573ba2407bcf6bacb171e5e8b32980ff549a59de1bd8b119d89f6304ae69b7c" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "3573ba2407bcf6bacb171e5e8b32980ff549a59de1bd8b119d89f6304ae69b7c" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 3573ba2407bcf6bacb171e5e8b32980ff549a59de1bd8b119d89f6304ae69b7c-net1]
Warning FailedCreatePodSandBox 18s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "0cdbf8cb322a3156d88f04a52c2bea0fc51511ffa6d21b4db9aa4ae44dc858e2" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "0cdbf8cb322a3156d88f04a52c2bea0fc51511ffa6d21b4db9aa4ae44dc858e2" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 0cdbf8cb322a3156d88f04a52c2bea0fc51511ffa6d21b4db9aa4ae44dc858e2-net1]
Normal AddedInterface 18s multus Add eth0 [10.42.0.24/32] from k8s-pod-network
Warning FailedCreatePodSandBox 17s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "93bbd85125dc93d15558f34aa2693d13781db6d38905925814151160ef405dc9" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "93bbd85125dc93d15558f34aa2693d13781db6d38905925814151160ef405dc9" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 93bbd85125dc93d15558f34aa2693d13781db6d38905925814151160ef405dc9-net1]
Normal AddedInterface 17s multus Add eth0 [10.42.0.25/32] from k8s-pod-network
Warning FailedCreatePodSandBox 16s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "8ea7b9cda5014ae0e8a3f335903e83c542156c4ec8de84c80a627ef3c3473cb1" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "8ea7b9cda5014ae0e8a3f335903e83c542156c4ec8de84c80a627ef3c3473cb1" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 8ea7b9cda5014ae0e8a3f335903e83c542156c4ec8de84c80a627ef3c3473cb1-net1]
Normal AddedInterface 16s multus Add eth0 [10.42.0.26/32] from k8s-pod-network
Warning FailedCreatePodSandBox 15s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "922f59df03433b78b31201f685867ac475fcb96c5b4791eecd642fe87b5ae365" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "922f59df03433b78b31201f685867ac475fcb96c5b4791eecd642fe87b5ae365" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 922f59df03433b78b31201f685867ac475fcb96c5b4791eecd642fe87b5ae365-net1]
Normal AddedInterface 15s multus Add eth0 [10.42.0.27/32] from k8s-pod-network
Normal AddedInterface 14s multus Add eth0 [10.42.0.28/32] from k8s-pod-network
Warning FailedCreatePodSandBox 14s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "68c5c26e73706571b562dfa035e6b53e848f7cc18c85b8a3995f0a2a3c338b97" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "68c5c26e73706571b562dfa035e6b53e848f7cc18c85b8a3995f0a2a3c338b97" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 68c5c26e73706571b562dfa035e6b53e848f7cc18c85b8a3995f0a2a3c338b97-net1]
Warning FailedCreatePodSandBox 13s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "60fda0e94bf41698460e2406a00d6443299a9b176da7ed8004f39adfc2bb16e0" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "60fda0e94bf41698460e2406a00d6443299a9b176da7ed8004f39adfc2bb16e0" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 60fda0e94bf41698460e2406a00d6443299a9b176da7ed8004f39adfc2bb16e0-net1]
Normal AddedInterface 12s multus Add eth0 [10.42.0.29/32] from k8s-pod-network
Warning FailedCreatePodSandBox 12s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "777d1178ca6d8681b1f0f43780fb357c0dce74a6905c94337c2f07ef9a5c9c36" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to set up pod "my-test-pod-fnjk7_default" network: [default/my-test-pod-fnjk7/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib4 GUID is not valid", failed to clean up sandbox container "777d1178ca6d8681b1f0f43780fb357c0dce74a6905c94337c2f07ef9a5c9c36" network for pod "my-test-pod-fnjk7": networkPlugin cni failed to teardown pod "my-test-pod-fnjk7_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name 777d1178ca6d8681b1f0f43780fb357c0dce74a6905c94337c2f07ef9a5c9c36-net1]
Normal AddedInterface 11s multus Add eth0 [10.42.0.30/32] from k8s-pod-network
The device plugin can detect the SRIOV net device on the host (node s-113-2-35 in my experiment), the output is shown in the following:
-MacBookPro:~/20-k8s-rdma-sriov/multus-cni/deployments$ kubectl get node s-113-2-35 -o json | jq '.status.allocatable'
{
"cpu": "128",
"ephemeral-storage": "5169411933432",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"mellanox.com/mlnx_sriov_rdma_ib": "4",
"memory": "528110968Ki",
"pods": "110"
}
NAD
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: ib-sriov-network
annotations:
k8s.v1.cni.cncf.io/resourceName: mellanox.com/mlnx_sriov_rdma_ib
spec:
config: '{
"type": "ib-sriov",
"cniVersion": "0.3.1",
"name": "sriov-network",
"ipam": {
"type": "host-local",
"subnet": "192.168.217.0/24",
"routes": [{
"dst": "0.0.0.0/0"
}],
"gateway": "192.168.217.1"
}
}'
mutlus configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: sriovdp-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [{
"resourcePrefix": "mellanox.com",
"resourceName": "mlnx_sriov_rdma_ib",
"selectors": {
"isRdma": true,
"vendors": ["15b3"],
"devices": ["101c"],
"drivers": ["mlx5_core"]
}
}
]
}
sriov device plugin
n-MacBookPro:~/20-k8s-rdma-sriov/multus-cni/deployments$ kubectl -n kube-system logs kube-sriov-device-plugin-amd64-bpwlk
I1122 11:59:59.507695 1 manager.go:51] Using Kubelet Plugin Registry Mode
I1122 11:59:59.508691 1 main.go:44] resource manager reading configs
I1122 11:59:59.508739 1 manager.go:79] raw ResourceList: {
"resourceList": [{
"resourcePrefix": "mellanox.com",
"resourceName": "mlnx_sriov_rdma_ib",
"selectors": {
"isRdma": true,
"vendors": ["15b3"],
"devices": ["101c"],
"drivers": ["mlx5_core"]
}
}
]
}
I1122 11:59:59.508875 1 factory.go:166] net device selector for resource mlnx_sriov_rdma_ib is &{DeviceSelectors:{Vendors:[15b3] Devices:[101c] Drivers:[mlx5_core] PciAddresses:[]} PfNames:[] RootDevices:[] LinkTypes:[] DDPProfiles:[] IsRdma:true NeedVhostNet:false}
I1122 11:59:59.508902 1 manager.go:99] unmarshalled ResourceList: [{ResourcePrefix:mellanox.com ResourceName:mlnx_sriov_rdma_ib DeviceType:netDevice Selectors:0xc00000cd38 SelectorObj:0xc000375380}]
I1122 11:59:59.508960 1 manager.go:200] validating resource name "mellanox.com/mlnx_sriov_rdma_ib"
I1122 11:59:59.508968 1 main.go:60] Discovering host devices
I1122 11:59:59.589424 1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c2:00.0 02 Intel Corporation Ethernet Controller X710 for 10GbE SFP+
I1122 11:59:59.589938 1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c2:00.1 02 Intel Corporation Ethernet Controller X710 for 10GbE SFP+
I1122 11:59:59.590256 1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.0 02 Mellanox Technolo... MT28908 Family [ConnectX-6]
I1122 11:59:59.591462 1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.1 02 Mellanox Technolo... MT28908 Family [ConnectX-6]
I1122 11:59:59.591704 1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.2 02 Mellanox Technolo... MT28908 Family [ConnectX-6 Virtual Fu...
I1122 11:59:59.591894 1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.3 02 Mellanox Technolo... MT28908 Family [ConnectX-6 Virtual Fu...
I1122 11:59:59.592053 1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.4 02 Mellanox Technolo... MT28908 Family [ConnectX-6 Virtual Fu...
I1122 11:59:59.592203 1 netDeviceProvider.go:84] netdevice AddTargetDevices(): device found: 0000:c3:00.5 02 Mellanox Technolo... MT28908 Family [ConnectX-6 Virtual Fu...
I1122 11:59:59.592383 1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:01:00.0 12 unknown unknown
I1122 11:59:59.592392 1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:22:00.0 12 unknown unknown
I1122 11:59:59.592397 1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:41:00.0 12 unknown unknown
I1122 11:59:59.592403 1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:61:00.0 12 unknown unknown
I1122 11:59:59.592407 1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:81:00.0 12 unknown unknown
I1122 11:59:59.592412 1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:a1:00.0 12 unknown unknown
I1122 11:59:59.592417 1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:c1:00.0 12 unknown unknown
I1122 11:59:59.592421 1 accelDeviceProvider.go:82] accelerator AddTargetDevices(): device found: 0000:e1:00.0 12 unknown unknown
I1122 11:59:59.592429 1 main.go:66] Initializing resource servers
I1122 11:59:59.592731 1 manager.go:105] number of config: 1
I1122 11:59:59.592739 1 manager.go:109]
I1122 11:59:59.592742 1 manager.go:110] Creating new ResourcePool: mlnx_sriov_rdma_ib
I1122 11:59:59.592746 1 manager.go:111] DeviceType: netDevice
W1122 11:59:59.592779 1 pciNetDevice.go:55] RDMA resources for 0000:c2:00.0 not found. Are RDMA modules loaded?
I1122 11:59:59.593104 1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c2:00.0. error getting devlink device attributes for net device 0000:c2:00.0 no such device
W1122 11:59:59.593215 1 pciNetDevice.go:55] RDMA resources for 0000:c2:00.1 not found. Are RDMA modules loaded?
I1122 11:59:59.593362 1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c2:00.1. error getting devlink device attributes for net device 0000:c2:00.1 no such device
I1122 11:59:59.594005 1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c3:00.1. <nil>
I1122 11:59:59.596385 1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c3:00.2. <nil>
I1122 11:59:59.597465 1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c3:00.3. <nil>
I1122 11:59:59.598273 1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c3:00.4. <nil>
I1122 11:59:59.599262 1 utils.go:71] Devlink query for eswitch mode is not supported for device 0000:c3:00.5. <nil>
I1122 11:59:59.599408 1 factory.go:106] device added: [pciAddr: 0000:c3:00.2, vendor: 15b3, device: 101c, driver: mlx5_core]
I1122 11:59:59.599417 1 factory.go:106] device added: [pciAddr: 0000:c3:00.3, vendor: 15b3, device: 101c, driver: mlx5_core]
I1122 11:59:59.599423 1 factory.go:106] device added: [pciAddr: 0000:c3:00.4, vendor: 15b3, device: 101c, driver: mlx5_core]
I1122 11:59:59.599428 1 factory.go:106] device added: [pciAddr: 0000:c3:00.5, vendor: 15b3, device: 101c, driver: mlx5_core]
I1122 11:59:59.599446 1 manager.go:139] New resource server is created for mlnx_sriov_rdma_ib ResourcePool
I1122 11:59:59.599454 1 main.go:72] Starting all servers...
I1122 11:59:59.599885 1 server.go:199] starting mlnx_sriov_rdma_ib device plugin endpoint at: mellanox.com_mlnx_sriov_rdma_ib.sock
I1122 11:59:59.602783 1 server.go:226] mlnx_sriov_rdma_ib device plugin endpoint started serving
I1122 11:59:59.602805 1 main.go:77] All servers started.
I1122 11:59:59.602811 1 main.go:78] Listening for term signals
I1122 12:00:00.175755 1 server.go:110] Plugin: mellanox.com_mlnx_sriov_rdma_ib.sock gets registered successfully at Kubelet
I1122 12:00:00.175875 1 server.go:134] ListAndWatch(mlnx_sriov_rdma_ib) invoked
I1122 12:00:00.175890 1 server.go:142] ListAndWatch(mlnx_sriov_rdma_ib): send devices &ListAndWatchResponse{Devices:[]*Device{&Device{ID:0000:c3:00.4,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:1,},},},},&Device{ID:0000:c3:00.5,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:1,},},},},&Device{ID:0000:c3:00.2,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:1,},},},},&Device{ID:0000:c3:00.3,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:1,},},},},},}
I1122 12:04:42.983933 1 server.go:119] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest{DevicesIDs:[0000:c3:00.3],},},}
I1122 12:04:42.984024 1 netResourcePool.go:51] GetDeviceSpecs(): for devices: [0000:c3:00.3]
I1122 12:04:42.984044 1 pool_stub.go:97] GetEnvs(): for devices: [0000:c3:00.3]
I1122 12:04:42.984052 1 pool_stub.go:113] GetMounts(): for devices: [0000:c3:00.3]
I1122 12:04:42.984059 1 server.go:128] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_MELLANOX_COM_MLNX_SRIOV_RDMA_IB: 0000:c3:00.3,},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec{ContainerPath:/dev/infiniband/issm3,HostPath:/dev/infiniband/issm3,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/umad3,HostPath:/dev/infiniband/umad3,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/uverbs3,HostPath:/dev/infiniband/uverbs3,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/rdma_cm,HostPath:/dev/infiniband/rdma_cm,Permissions:rwm,},},Annotations:map[string]string{},},},}
I1122 12:22:33.340229 1 server.go:119] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest{DevicesIDs:[0000:c3:00.4],},},}
I1122 12:22:33.340326 1 netResourcePool.go:51] GetDeviceSpecs(): for devices: [0000:c3:00.4]
I1122 12:22:33.340347 1 pool_stub.go:97] GetEnvs(): for devices: [0000:c3:00.4]
I1122 12:22:33.340355 1 pool_stub.go:113] GetMounts(): for devices: [0000:c3:00.4]
I1122 12:22:33.340362 1 server.go:128] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_MELLANOX_COM_MLNX_SRIOV_RDMA_IB: 0000:c3:00.4,},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec{ContainerPath:/dev/infiniband/issm4,HostPath:/dev/infiniband/issm4,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/umad4,HostPath:/dev/infiniband/umad4,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/uverbs4,HostPath:/dev/infiniband/uverbs4,Permissions:rwm,},&DeviceSpec{ContainerPath:/dev/infiniband/rdma_cm,HostPath:/dev/infiniband/rdma_cm,Permissions:rwm,},},Annotations:map[string]string{},},},}
I print guid , it shows guid all 00. How to fix this?
n-MacBookPro:~/20-k8s-rdma-sriov/ib-sriov-cni/deployment/examples$ kubectl describe pod my-test-pod
Name: my-test-pod
Namespace: default
Priority: 0
Node: s-113-2-35/10.113.2.35
Start Time: Tue, 22 Nov 2022 22:02:12 +0800
Labels: <none>
Annotations: cni.projectcalico.org/containerID: dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
k8s.v1.cni.cncf.io/networks: [{"name": "ib-sriov-network"}]
Status: Pending
IP:
IPs: <none>
Containers:
my-test-ctr:
Container ID:
Image: mellanox/rping-test
Image ID:
Port: <none>
Host Port: <none>
Command:
sh
-c
sleep 1000000
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
mellanox.com/mlnx_sriov_rdma_ib: 1
Requests:
mellanox.com/mlnx_sriov_rdma_ib: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jw2sr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-jw2sr:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <invalid> default-scheduler Successfully assigned default/my-test-pod to s-113-2-35
Normal AddedInterface <invalid> multus Add eth0 [10.42.0.219/32] from k8s-pod-network
Warning FailedCreatePodSandBox <invalid> kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd" network for pod "my-test-pod": networkPlugin cni failed to set up pod "my-test-pod_default" network: [default/my-test-pod/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib2 GUID is not valid, HardwareAddr:00:00:00:e7:fe:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00, guid:00:00:00:00:00:00:00:00", failed to clean up sandbox container "dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd" network for pod "my-test-pod": networkPlugin cni failed to teardown pod "my-test-pod_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd-net1]
Normal SandboxChanged <invalid> kubelet Pod sandbox changed, it will be killed and re-created.
I meet the same question; you need first config vf node GUID and port GUID, Then use the command ibdev2netdev -v
to check and display VF of status is up, and then you can use vf normally
Hey @zhutong196, Could you tell me how to configure the vf node GUID and port GUID?