ucx
ucx copied to clipboard
UCP/RNDV/GTEST: Fix support of user's memh for RNDV AM and PUT Zcopy schemes
What
Fix support of user's memh for RNDV AM and PUT Zcopy schemes.
Why ?
User's memory handle isn't used by RNDV AM and PUT Zcopy schemes.
How ?
- Update
ucp_rndv_reg_send_bufferfunction to set a user's memory handle even if RNDV scheme isn't set to GET Zcopy. - Reproduce the issue for RNDV AM and PUT Zcopy schemes by updating
test_am_send_recvto support a send memory handle.
@hoopoepg can you pls review as well?
test failure seems relevant:
[ OK ] udx/test_ucp_sockaddr_protocols.stream_zcopy_64k_exp/0 (444 ms)
[ RUN ] udx/test_ucp_sockaddr_protocols.am_rndv_64k_recv_prereg_single_rndv_put_zcopy_lane/0 <ud_x,cuda_copy,rocm_copy/mt>
[ INFO ] server listening on 10.10.3.1:58904
[swx-rain03-bf1:1839912:0:1839912] ucp_mm.c:111 Assertion `alloc_md_memh_p != ((void *)0)' failed
==== backtrace (tid:1839912) ====
0 /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucs/.libs/libucs.so.0(ucs_handle_error+0x2d4) [0xffff898a33fc]
1 /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucs/.libs/libucs.so.0(ucs_fatal_error_message+0xe0) [0xffff898a06a8]
2 /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucs/.libs/libucs.so.0(ucs_fatal_error_format+0x100) [0xffff898a07b0]
3 /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_mem_rereg_mds+0x764) [0xffff896cdd14]
4 /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_request_memory_reg+0x35c) [0xffff896d4824]
5 /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_rndv_rtr_handler+0x698) [0xffff8973f090]
6 /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/uct/ib/.libs/libuct_ib.so.0(uct_ud_ep_process_rx+0x298) [0xffff895ef860]
7 /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/uct/ib/.libs/libuct_ib.so.0(+0x90344) [0xffff895fa344]
8 /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_worker_progress+0x68) [0xffff896e1ec8]
test failure seems relevant:
@yosefe fixed, it was a test issue - need to check whether resetting MDs is allowed (PUT/GET Zcopy protocols will be really used) or not (fallback to AM)
@yosefe @hoopoepg could you review pls one more time? the test issue was fully resolved
@yosefe squashed, could you pls re-approve?