k8s-rdma-shared-dev-plugin icon indicating copy to clipboard operation
k8s-rdma-shared-dev-plugin copied to clipboard

"missing RDMA device spec for device 0000:e5:00.1, RDMA device \"issm\" not found"

Open gurumohan123 opened this issue 1 year ago • 7 comments
trafficstars

Getting "missing RDMA device spec for device 0000:e5:00.1, RDMA device "issm" not found" error while creating a pod after installing k8s-rdma-shared-dev-plugin, what is the solution for this error.

gurumohan123 avatar Jan 09 '24 09:01 gurumohan123

You can run these command.

mst start 
mst status -v 

maybe the 0000:e5:00.1 id is ethernet interface card.

sober-wang avatar Jan 19 '24 06:01 sober-wang

I also had the same problem. My logs are as follows:

error creating new device: "missing RDMA device spec for device 0000:e1:00.0, RDMA device \"issm\" not found"

I add the device's deviceID and vendors to RDMA Shared Device Plugin Configurations,then apply and restart the pods.The problem was solved.

I also encountered the same issue. My logs were as follows:

error creating new device: "missing RDMA device spec for device 0000:e1:00.0, RDMA device \"issm\" not found"

I added the device's deviceID and vendor information to the RDMA Shared Device Plugin configurations, then applied the changes and restarted the pods. The problem was resolved.

My RDMA Shared Device Plugin Configuration is as follows:

apiVersion: v1
kind: ConfigMap
metadata:
  name: rdma-devices
  namespace: kube-system
data:
  config.json: |
    {
        "periodicUpdateInterval": 300,
        "configList": [{
             "resourceName": "cx5_bond_shared_devices_a",
             "rdmaHcaMax": 1000,
             "selectors": {
               "vendors": ["15b3"],
               "deviceIDs": ["1017","1019"]
             }
           },
           {
             "resourceName": "cx6dx_shared_devices_b",
             "rdmaHcaMax": 500,
             "selectors": {
               "vendors": ["15b3"],
               "deviceIDs": ["101d"]
             }
           }
        ]
    }

a-c-dream avatar Jun 20 '24 12:06 a-c-dream

assuming 0000:e5:00.1 belongs to an MLNX NIC, the error "issm" not found" means that some linux char device ( found under /dev/infiniband/issm<N>) is missing for the selected NIC. that means not all rdma modules were loaded. do you have rmda-core package installed ? it sets up udev rules to bind the needed drivers to MLNX NIC.

adrianchiris avatar Jun 24 '24 06:06 adrianchiris

assuming 0000:e5:00.1 belongs to an MLNX NIC, the error "issm" not found" means that some linux char device ( found under /dev/infiniband/issm<N>) is missing for the selected NIC. that means not all rdma modules were loaded. do you have rmda-core package installed ? it sets up udev rules to bind the needed drivers to MLNX NIC.

That solved it for me! Thanks.

souleb avatar Jul 10 '24 16:07 souleb