ucx icon indicating copy to clipboard operation
ucx copied to clipboard

UCP/RNDV/GTEST: Fix support of user's memh for RNDV AM and PUT Zcopy schemes

Open dmitrygx opened this issue 3 years ago • 2 comments

What

Fix support of user's memh for RNDV AM and PUT Zcopy schemes.

Why ?

User's memory handle isn't used by RNDV AM and PUT Zcopy schemes.

How ?

  1. Update ucp_rndv_reg_send_buffer function to set a user's memory handle even if RNDV scheme isn't set to GET Zcopy.
  2. Reproduce the issue for RNDV AM and PUT Zcopy schemes by updating test_am_send_recv to support a send memory handle.

dmitrygx avatar Aug 07 '22 14:08 dmitrygx

@hoopoepg can you pls review as well?

yosefe avatar Aug 08 '22 16:08 yosefe

test failure seems relevant:

[       OK ] udx/test_ucp_sockaddr_protocols.stream_zcopy_64k_exp/0 (444 ms)
[ RUN      ] udx/test_ucp_sockaddr_protocols.am_rndv_64k_recv_prereg_single_rndv_put_zcopy_lane/0 <ud_x,cuda_copy,rocm_copy/mt>
[     INFO ] server listening on 10.10.3.1:58904
[swx-rain03-bf1:1839912:0:1839912]      ucp_mm.c:111  Assertion `alloc_md_memh_p != ((void *)0)' failed
==== backtrace (tid:1839912) ====
 0  /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucs/.libs/libucs.so.0(ucs_handle_error+0x2d4) [0xffff898a33fc]
 1  /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucs/.libs/libucs.so.0(ucs_fatal_error_message+0xe0) [0xffff898a06a8]
 2  /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucs/.libs/libucs.so.0(ucs_fatal_error_format+0x100) [0xffff898a07b0]
 3  /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_mem_rereg_mds+0x764) [0xffff896cdd14]
 4  /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_request_memory_reg+0x35c) [0xffff896d4824]
 5  /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_rndv_rtr_handler+0x698) [0xffff8973f090]
 6  /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/uct/ib/.libs/libuct_ib.so.0(uct_ud_ep_process_rx+0x298) [0xffff895ef860]
 7  /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/uct/ib/.libs/libuct_ib.so.0(+0x90344) [0xffff895fa344]
 8  /scrap/azure/agent-01/AZP_WORKSPACE/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_worker_progress+0x68) [0xffff896e1ec8]

yosefe avatar Aug 10 '22 10:08 yosefe

test failure seems relevant:

@yosefe fixed, it was a test issue - need to check whether resetting MDs is allowed (PUT/GET Zcopy protocols will be really used) or not (fallback to AM)

dmitrygx avatar Aug 10 '22 21:08 dmitrygx

@yosefe @hoopoepg could you review pls one more time? the test issue was fully resolved

dmitrygx avatar Aug 12 '22 04:08 dmitrygx

@yosefe squashed, could you pls re-approve?

dmitrygx avatar Aug 13 '22 21:08 dmitrygx