k8s-rdma-sriov-dev-plugin
k8s-rdma-sriov-dev-plugin copied to clipboard
Failed to Create QP
I tried to deploy the rdma device plugin in HCA mode in my kubernetes cluster. I followed the instruction and the device plugin can be registered successfully. If I run "kubectl describe node [node_name]", I can find the rdma/hca resource. If I run "ibstat" in the pods, the inifiniband information shows up and the status is active/up.
However, when I tried to run a connection test using "ib_read_bw", it threw me following error: "Couldn't get device attribute. Unable to create QP. Failed to create QP. Couldn't create IB resource."
I simply run the test by running "ib_read_bw" in one pod and running "ib_read_bw [target_pod_ip_addr]" in another pod. Could anyone please help with this issue? I appreciate your help.
@zlwfrank container might not have IPC_LOCK capabilities.
Refer to example here to add "IPC_LOCK" line at appropriate place.
spec: restartPolicy: OnFailure containers:
- image: mellanox/mlnx_ofed_linux-4.4-1.0.0.0-centos7.4 name: mofed-test-ctr securityContext: capabilities: add: [ "IPC_LOCK" ]
@paravmellanox Thanks for the reply. Actually I was using the provided sample .yaml file and the IPC_LOCK capability had been added.
This is the file I used:
apiVersion: v1 kind: Pod metadata: name: ib-test-pod-1 spec: restartPolicy: OnFailure containers:
- image: mellanox/centos_7_4_mofed_4_2_1_2_0_0_60
name: mofed-test-ctr
securityContext:
capabilities:
add: [ "IPC_LOCK" ]
resources:
limits:
rdma/hca: 1
command:
- sh
- -c
- | ls -l /dev/infiniband /sys/class/net sleep 1000000
@zlwfrank have you resolved this problem? I got the same symptom of "fail to create qp" when running ib_read_bw inside container, and had no idea how to deal with.