k8s-rdma-sriov-dev-plugin icon indicating copy to clipboard operation
k8s-rdma-sriov-dev-plugin copied to clipboard

Failed to Create QP

Open zlwfrank opened this issue 4 years ago • 3 comments

I tried to deploy the rdma device plugin in HCA mode in my kubernetes cluster. I followed the instruction and the device plugin can be registered successfully. If I run "kubectl describe node [node_name]", I can find the rdma/hca resource. If I run "ibstat" in the pods, the inifiniband information shows up and the status is active/up.

However, when I tried to run a connection test using "ib_read_bw", it threw me following error: "Couldn't get device attribute. Unable to create QP. Failed to create QP. Couldn't create IB resource."

I simply run the test by running "ib_read_bw" in one pod and running "ib_read_bw [target_pod_ip_addr]" in another pod. Could anyone please help with this issue? I appreciate your help.

zlwfrank avatar Mar 19 '20 17:03 zlwfrank

@zlwfrank container might not have IPC_LOCK capabilities.

Refer to example here to add "IPC_LOCK" line at appropriate place.

spec: restartPolicy: OnFailure containers:

  • image: mellanox/mlnx_ofed_linux-4.4-1.0.0.0-centos7.4 name: mofed-test-ctr securityContext: capabilities: add: [ "IPC_LOCK" ]

paravmellanox avatar Mar 19 '20 18:03 paravmellanox

@paravmellanox Thanks for the reply. Actually I was using the provided sample .yaml file and the IPC_LOCK capability had been added.

This is the file I used:

apiVersion: v1 kind: Pod metadata: name: ib-test-pod-1 spec: restartPolicy: OnFailure containers:

  • image: mellanox/centos_7_4_mofed_4_2_1_2_0_0_60 name: mofed-test-ctr securityContext: capabilities: add: [ "IPC_LOCK" ] resources: limits: rdma/hca: 1 command:
    • sh
    • -c
    • | ls -l /dev/infiniband /sys/class/net sleep 1000000

zlwfrank avatar Mar 19 '20 20:03 zlwfrank

@zlwfrank have you resolved this problem? I got the same symptom of "fail to create qp" when running ib_read_bw inside container, and had no idea how to deal with.

yh-xu avatar Dec 22 '20 01:12 yh-xu