superbenchmark icon indicating copy to clipboard operation
superbenchmark copied to clipboard

superbench mpi job should use the proper Ethernet interface

Open LiweiPeng opened this issue 3 years ago • 1 comments
trafficstars

Several superbench tests (e.g. nccltests, ib-traffic ) use openmpi to launch the tests in multiple nodes. Some node type is designed to have multiple ethernet interfaces (e.g. azure2, eth0, docker0, ib0, ib1 etc). The working IPv4 ethernet interface is not the default 'eth0' (e.g. azure2).

While a user can manually check its node type and figure out which ethernet type to use for the MPI (e.g. --mca btl_tcp_if_include azure2 --mca oob_tcp_if_include azure2), it is not generic across diff node types.

Expected: because superbench launches the MPI command, superbench should detect the proper ethernet interface to use. And add it to the openmpi command line.

The following is one way to find this interface using bash. It will be much simpler to use python to do this.

get_eth_interfaces() { IPV4List=$(ip -4 -f inet a |grep mtu|awk '{print $2}' | sed ':a; N; $!ba; s/\n//g') for ifname in $(ls /sys/class/net); do if [[ -f /sys/class/net/$ifname/type && $(cat /sys/class/net/$ifname/type) -eq 1 && ! -f /sys/class/net/$ifname/bridge ]]; then isIPV4=$(echo ${IPV4List} | grep "$ifname:" | wc -l) isDocker=$(echo $ifname | grep docker | wc -l) if [[ "${isIPV4}" == "1" && "${isDocker}" == "0" ]]; then echo $ifname fi fi done }

LiweiPeng avatar Aug 18 '22 21:08 LiweiPeng

When there're multiple Ethernet interfaces available, there won't be "default", OMPI will automatically detect the usable interface using its routability algorithm. The detection is complex and it still cannot cover some cases which require user to specify btl_tcp_if_include and btl_tcp_if_exclude explicitly.

You can pass through such parameters in config yaml:

modes:
- name: mpi
  mca:
    btl_tcp_if_include: azure2

For the proposed get_eth_interfaces detection, it cannot cover some scenarios as well and may not be better than current OMPI's detection. Here're two cases:

  • "MPI job that spans public and private networks" that OMPI cannot cover, consider the following two nodes

    |    Node A     |     |    Node B     |
    |               |     |               |
    |     eth0      |     |     eth0      |    # public network, cannot route to each other
    | 192.168.100.1 |     | 192.168.100.2 |
    |               |     |               |
    |     eth1      |     |     eth1      |    # private network, can route to each other
    | 192.168.200.1 |     | 192.168.200.2 |
    

    eth0 is only used to access Internet and they cannot route to each other, while eth1 of two nodes are connected to the same switch in private networking. You detection method would think both of them are usable (ipv4, with ip, no bridge, not docker), it's hard to choose the correct one without extra knowledge for cluster networking.

  • When all ethernet interfaces (azure2 and eth0 in your case) are disabled or unavailable, IPoIB can also be used for mpi, for example, changing to btl_tcp_if_include ib0 should also work in your environment. The get_eth_interfaces detection would also exclude suitable interfaces.

abuccts avatar Aug 31 '22 11:08 abuccts

will use one by setting the config file

LiweiPeng avatar Sep 01 '22 19:09 LiweiPeng