ansible-role-interfaces icon indicating copy to clipboard operation
ansible-role-interfaces copied to clipboard

When configuring an IP over IB (`ipoib`) interface failing with "Interface ib0 is not active"

Open Aethylred opened this issue 2 years ago • 5 comments

I'm trying to set up an InfiniBand interface on a Mellanox ConnectX-6 with OFED driver version 5.5-1.0.3.2 on Rocky 8.5

Drivers are installed and interfaces can be brought up manually.

I'm calling the role like this because the role has already been called earlier to set up the real Ethernet interfaces:

---
- name: Configure Infiniband interfaces
  hosts: infiniband

  tasks:
    - name: Configure Infinband interfaces
      import_role:
        name: michaelrigart.interfaces
      vars:
        interfaces_pause_time: 120
        interfaces_ether_interfaces:
          - device: "{{ infiniband_interface }}"
            bootproto: static
            address: "{{ ib_ip }}"
            netmask: "{{ infiniband_netmask }}"
            type: ipoib
      become: true

I've added interfaces_pause_time: 120 as I assumed that the interfaces were just taking time to become active after being bounced, I'

However when executing the playbook they end with:

RUNNING HANDLER [michaelrigart.interfaces : Check active Ethernet interface state] *********************************************
failed: [ib-host11] (item={'device': 'ib0', 'bootproto': 'static', 'address': '10.10.10.11', 'netmask': '255.255.252.0', 'type': 'ipoib'}) => {"ansible_loop_var": "item", "changed": false, "item": {"address": "10.10.10.11", "bootproto": "static", "device": "ib0", "netmask": "255.255.252.0", "type": "ipoib"}, "msg": "Interface ib0 is not active"}

I've check for other issues for ipoib and #76 and #58 look like they've been resolved, and don't seem to help resolve this issue.

Aethylred avatar Apr 12 '22 04:04 Aethylred

Hi @Aethylred. You can see where that error is generated here. It means that the Ansible fact for the interface has marked it as not active.

You could check the actual interface status, to see if it is up. You could also check the generated ifcfg file, to see if it is as you would expect.

markgoddard avatar Apr 12 '22 08:04 markgoddard

After the playbook fails, logging into the host the ifcfg-ib0 looks good and ifup ib0 works.

Aethylred avatar Apr 12 '22 08:04 Aethylred

If I extend the interface pause to interfaces_pause_time: 300 then it succeeds.

I think there may be a delay while the interface and our subnet manager sort themselves out.

Aethylred avatar Apr 12 '22 09:04 Aethylred

Interesting. Is there anything we need to change here?

markgoddard avatar Apr 12 '22 09:04 markgoddard

Not sure, I think it would be better if it could poll for the interface being 'ready' or 'active' rather than refreshing the facts to get the interface state.

Ideally with a retry limit and a timeout.

Aethylred avatar Apr 14 '22 02:04 Aethylred