k8s-rdma-device-plugin icon indicating copy to clipboard operation
k8s-rdma-device-plugin copied to clipboard

ibv_devinfo output "Failed to open device"

Open hiyijian opened this issue 6 years ago • 8 comments

I config 8 VFs and following is ibv_devinfo's output in demo container:

Failed to open device
Failed to open device
Failed to open device
Failed to open device
Failed to open device
hca_id: mlx4_2
        transport:                      InfiniBand (0)
        fw_ver:                         2.40.7000
        node_guid:                      0014:0500:8cc6:cd0a
        sys_image_guid:                 248a:0703:00e5:3d43
        vendor_id:                      0x02c9
        vendor_part_id:                 4100
        hw_ver:                         0x1
        board_id:                       MT_1100120019
        phys_port_cnt:                  1
        Device ports:
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 2
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             InfiniBand

Failed to open device
Failed to open device
Failed to open device

Is that normal? It seems that ibv_devinfo is looking for VFs which is not assigned to it

I run a simple openmpi program and got error message shows that openmpi is looking for mlx4_1, but VF assigned to this pod is mlx4_2

Thanks!

hiyijian avatar Mar 16 '18 10:03 hiyijian

It seems you configed with the IB mode, not the RoCE mode.

hustcat avatar Mar 23 '18 03:03 hustcat

Yes. The Plugin only support for RoCE now?

hiyijian avatar Mar 23 '18 03:03 hiyijian

I don't have IB network, and I am not sure IB can work with SRIOV.

hustcat avatar Mar 23 '18 16:03 hustcat

IB works with SRIOV. I have small library in development that this plugin will depend and take away the intricacies of IB vs RoCE. this library will be useful to do more things on newer kernels. In few days I will update you about it.

paravmellanox avatar Apr 29 '18 18:04 paravmellanox

@paravmellanox is there any update now? Thanks

hiyijian avatar May 16 '18 02:05 hiyijian

@hiyijian Hi, I am trying to configure RDMA on IB either. I configured SR-IOV according to steps in this doc,my device is ConnectX-3 and every step seems alright, but I can't find VFs when executing lspci | grep Mellanox Do you have any ideas? Thanks

addcloud avatar May 16 '18 08:05 addcloud

@addcloud I am not an expert at network stuff at all. I used to stuck in enabling SRIOV for a quite long time. The reason for failing to enable it might be quite complicated, including network type(Ethernet or Infiniband), BIOS setting, firmware version(yes, I reburn it under supervison of support people from Mellanox). It is quite suffering. My suggestion is contact Mellanox directly.

hiyijian avatar May 17 '18 02:05 hiyijian

@hiyijian thanks for your patience. It's complicated indeed, I will take your suggestion.

addcloud avatar May 17 '18 04:05 addcloud