UCP/WIREUP: specify a reason for wireup failure - part 2
What
During the wireup process, provide a reason when a resource's lane is unreachable (second part after https://github.com/openucx/ucx/pull/9995. Includes all the transports other than IB).
Why ?
Users need more information about why a device is not reachable after an unsuccessful connection establishment. This information should be passed from UCT to UCP.
How ?
Pass a string to select/search_lane functions so in case the wireup process fails, the reason will be printed out in an upper layer.
@shasson5 can you pls review?
failure seems relevant:
2024-08-26T07:53:11.2247900Z [ RUN ] tcp/test_uct_sockaddr_err_handle_non_exist_ip.conn_to_non_exist_ip/2 </lo>
2024-08-26T07:53:11.2250959Z [ INFO ] Testing tcp on 0.0.0.0:49036 interface lo
2024-08-26T07:55:18.5225372Z [ OK ] tcp/test_uct_sockaddr_err_handle_non_exist_ip.conn_to_non_exist_ip/2 (127298 ms)
2024-08-26T07:55:18.5256477Z [----------] 1 test from tcp/test_uct_sockaddr_err_handle_non_exist_ip (127298 ms total)
2024-08-26T07:55:18.5257776Z
2024-08-26T07:55:18.5258713Z [----------] 1 test from cuda_ipc/test_uct_ep
2024-08-26T07:55:18.5261351Z [ RUN ] cuda_ipc/test_uct_ep.is_connected/0 <cuda_ipc/cuda>
2024-08-26T07:55:18.5262355Z /__w/1/s/contrib/../test/gtest/uct/test_uct_ep.cc:220: Failure
2024-08-26T07:55:18.5262982Z Value of: is_connected_to_sender(*m_receiver)
2024-08-26T07:55:18.5263335Z Actual: false
2024-08-26T07:55:18.5263591Z Expected: true
2024-08-26T07:55:18.5263890Z [ FAILED ] cuda_ipc/test_uct_ep.is_connected/0, where GetParam() = cuda_ipc/cuda (0 ms)
2024-08-26T07:55:18.5264403Z [----------] 1 test from cuda_ipc/test_uct_ep (0 ms total)
coverity failure seems relevant
@yosefe it needs to be approved again. Thanks
@amastbaum please squash