misc: use PMI_HOSTNAME to select preferred interface
Pull Request Description
We don't have a mechanism for ch4 to select preferred interface as we does in ch3 using MPIR_CVAR_CH3_INTERFACE_HOSTNAME. This PR changes the CVAR into PMI_HOSTNAME, thus becoming a general PMI mechanism. We use it to select libfabfabric provider when multi-nic is not enabled.
[skip warnings]
Fixes #7028
Author Checklist
- [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
- [x] Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short descriptionCommit message explains what's in the commit. - [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
- [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.
Test on my local computer:
$ MPIR_CVAR_DEBUG_SUMMARY=1 mpirun -hosts 172.17.0.1 -n 1 ./cpi
==== Various sizes and limits ====
sizeof(MPIDI_per_vci_t): 192
Required minimum FI_VERSION: 0, current version: 1000f
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 140.221.16.19
provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 172.17.0.1
provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 127.0.0.1
provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] ::1
provider: shm, score = 4, pref = -2, FI_ADDR_STR [17] - fi_shm://1565940
provider: shm, score = 4, pref = -2, FI_ADDR_STR [17] - fi_shm://1565940
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 140.221.16.19
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 172.17.0.1
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::9603:d3de:b776:c1e9
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42:f8ff:fe53:d2d7
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1
provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1
Required minimum FI_VERSION: 10005, current version: 1000f
==== Capability set configuration ====
libfabric provider: sockets - 172.17.0.0/16
MPIDI_OFI_ENABLE_DATA: 1
MPIDI_OFI_ENABLE_AV_TABLE: 1
MPIDI_OFI_ENABLE_SCALABLE_ENDPOINTS: 1
MPIDI_OFI_ENABLE_SHARED_CONTEXTS: 0
MPIDI_OFI_ENABLE_MR_VIRT_ADDRESS: 0
MPIDI_OFI_ENABLE_MR_ALLOCATED: 0
MPIDI_OFI_ENABLE_MR_REGISTER_NULL: 1
MPIDI_OFI_ENABLE_MR_PROV_KEY: 0
MPIDI_OFI_ENABLE_TAGGED: 1
MPIDI_OFI_ENABLE_AM: 1
MPIDI_OFI_ENABLE_RMA: 1
MPIDI_OFI_ENABLE_ATOMICS: 1
MPIDI_OFI_FETCH_ATOMIC_IOVECS: 1
MPIDI_OFI_ENABLE_DATA_AUTO_PROGRESS: 0
MPIDI_OFI_ENABLE_CONTROL_AUTO_PROGRESS: 0
MPIDI_OFI_ENABLE_PT2PT_NOPACK: 1
MPIDI_OFI_ENABLE_TRIGGERED: 0
MPIDI_OFI_ENABLE_HMEM: 0
MPIDI_OFI_NUM_AM_BUFFERS: 8
MPIDI_OFI_NUM_OPTIMIZED_MEMORY_REGIONS: 0
MPIDI_OFI_CONTEXT_BITS: 20
MPIDI_OFI_SOURCE_BITS: 0
MPIDI_OFI_TAG_BITS: 31
MPIDI_OFI_VNI_USE_DOMAIN: 1
MAXIMUM SUPPORTED RANKS: 4294967296
MAXIMUM TAG: 2147483648
==== Provider global thresholds ====
max_buffered_send: 255
max_buffered_write: 255
max_msg_size: 9223372036854775807
max_order_raw: -1
max_order_war: -1
max_order_waw: -1
tx_iov_limit: 8
rx_iov_limit: 8
rma_iov_limit: 8
max_mr_key_size: 8
==== Various sizes and limits ====
MPIDI_OFI_AM_MSG_HEADER_SIZE: 24
MPIDI_OFI_MAX_AM_HDR_SIZE: 255
sizeof(MPIDI_OFI_am_request_header_t): 416
sizeof(MPIDI_OFI_per_vci_t): 52480
MPIDI_OFI_AM_HDR_POOL_CELL_SIZE: 1024
MPIDI_OFI_DEFAULT_SHORT_SEND_SIZE: 16384
==== OFI dynamic settings ====
num_vcis: 1
num_nics: 1
======================================
error checking : enabled
QMPI : disabled
debugger support : disabled
thread level : MPI_THREAD_SINGLE
thread CS : per-vci
threadcomm : enabled
==== data structure summary ====
sizeof(MPIR_Comm): 1808
sizeof(MPIR_Request): 512
sizeof(MPIR_Datatype): 280
================================
Process 0 of 1 is on tiger
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000066
It is able to select the 172.17.0.0/16 interface despite not being the first one from libfabric.
test:mpich/ch3/most test:mpich/ch4/most
what value did you set PMI_HOSTNAME to ?
what value did you set PMI_HOSTNAME to ?
To the IP address of your local host. Hydra (mpiexec) in this PR will set it automatically.