sriov-network-operator
sriov-network-operator copied to clipboard
infiniBand SRI-OV CNI failed to configure VF "VF ib2 GUID is not valid"
Hi Team,
i think i am really near to get it work, but got this in describing my testing pod:
Normal AddedInterface 2s multus Add eth0 [10.233.117.195/32] from cni0
Warning FailedCreatePodSandBox 1s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "88aee8b5e04d60a0fdfe9437888521e2f49a170b67ce574adf571f05f644bf74": [default/test-sriov-ib-pod:example-sriov-ib-network]: error adding container to network "example-sriov-ib-network": infiniBand SRI-OV CNI failed to configure VF "VF ib2 GUID is not valid"
i use the following manifest to test :
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: infiniband-sriov
namespace: cattle-sriov-system
spec:
deviceType: netdevice
mtu: 1500
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
deviceID: "101c"
linkType: ib
isRdma: true
numVfs: 4
priority: 90
resourceName: mlnxnics
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: example-sriov-ib-network
namespace: cattle-sriov-system
spec:
ipam: |
{
"type": "whereabouts",
"range": "192.168.5.225/28"
}
resourceName: mlnxnics
linkState: enable
networkNamespace: default
kind: Pod
metadata:
name: test-sriov-ib-pod
annotations:
k8s.v1.cni.cncf.io/networks: example-sriov-ib-network
spec:
containers:
- name: test-sriov-ib-pod
image: centos/tools
imagePullPolicy: IfNotPresent
command:
- sh
- -c
- sleep inf
securityContext:
capabilities:
add: [ "IPC_LOCK" ]
resources:
requests:
rancher.io/mlnxnics: "1"
limits:
rancher.io/mlnxnics: "1
can you give me advice to fix it ? Thanks a lot
Hi @e0ne @seb-835 any update on this issue or we can close it?
No update on this case, still having the issue. Any help appreciate to soldve it.
Greeting!
After node sriov configuration via config daemon and before scheduling an IB workload on the node what are the VFs hardware address ?
it seems they are all zeroes or ones according to CNI failure
https://github.com/k8snetworkplumbingwg/ib-sriov-cni/blob/5473e6b97fa532233221a5e2ee06aa182457ffc0/pkg/sriov/sriov.go#L259
what OS and kernel are you using ? maybe the kernel does not support get/set of VF port and node guid
in sriov-network-config-daemon logs do you see error after: : "setVfGuid()" log msg ?
can you add sriov-network-config-daemon logs when it tries to configure sriov for the node ?
I also have the same problem, I don't know how to solve it, does anyone know how to solve it, please contact me, my email is [email protected]
Hi @seb-835 @fu7100 any update on this issue we are waiting for some logs. If you manage to make it work let me know I will close this issue thanks!
I have the same question. "infiniBand SRI-OV CNI failed to configure VF "VF ib9 GUID is not valid""。 I solved this problem by manually configuring the node, port and policy of VF. However, I am puzzled that the plug-in should automatically configure the relevant information of VF, instead of requiring me to configure it manually. What is the reason for this? Can you help me solve it? Thank you very much.
I have the same question. "infiniBand SRI-OV CNI failed to configure VF "VF ib9 GUID is not valid""。 I solved this problem by manually configuring the node, port and policy of VF. However, I am puzzled that the plug-in should automatically configure the relevant information of VF, instead of requiring me to configure it manually. What is the reason for this? Can you help me solve it? Thank you very much.
Hi @frye233
Could you tell me how did you manually configure the node/port GUID of VF? I have the same issue with raw ib-sriov cni and dp deployment. In addition, the VF I created all remain DOWN and I don't know how to bring them up though the ib PF is UP. thanks.
any update on this issue or we can close it?
no update closing this one