ucx icon indicating copy to clipboard operation
ucx copied to clipboard

GPU test fails on assertion

Open iyastreb opened this issue 2 years ago • 0 comments

Describe the bug

"gpu on worker 0" test dcx/test_ucp_tag_mem_type.reuse_buffers_mrail/15 <dc_x,cuda_copy,rocm_copy/host:cuda,offload> failed with assertion, which seems to be unrelated to my change (https://github.com/openucx/ucx/pull/9525):

[ RUN      ] dcx/test_ucp_tag_mem_type.reuse_buffers_mrail/15 <dc_x,cuda_copy,rocm_copy/host:cuda,offload>
[     INFO ] 0 1 16 128 1048512 1048580 4194324 180466 [swx-rdmz-ucx-gpu-01:32224:4:32224] datatype_iter.inl:320  Assertion `(memh == NULL) || ucp_memh_is_zero_length(memh) || ucp_memh_is_user_memh(memh)' failed: memh=0x2391c100
==== backtrace (tid:  32224) ====
 0  /__w/1/s/build-test/src/ucs/.libs/libucs.so.0(ucs_handle_error+0x12c) [0x507252c]
 1  /__w/1/s/build-test/src/ucs/.libs/libucs.so.0(ucs_fatal_error_message+0x58) [0x506f598]
 2  /__w/1/s/build-test/src/ucs/.libs/libucs.so.0(ucs_fatal_error_format+0xd1) [0x506f721]
 3  /__w/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_proto_rndv_ats_complete+0x230) [0x57d5dc0]
 4  /__w/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_proto_rndv_ats_progress+0x164) [0x57d6644]
 5  /__w/1/s/build-test/src/ucp/.libs/libucp.so.0(+0xb8f65) [0x57dcf65]
 6  /__w/1/s/build-test/src/uct/cuda/.libs/libuct_cuda.so.0(+0x92fa) [0x64ab2fa]
 7  /__w/1/s/build-test/src/ucp/.libs/libucp.so.0(ucp_worker_progress+0x72) [0x5796032]
 8  /__w/1/s/build-test/test/gtest/gtest() [0xab582c]

Is it known issue?

Here is the full job log: https://dev.azure.com/ucfconsort/ucx/_build/results?buildId=72747&view=logs&j=da813497-1cca-54a8-f0a9-cb3fd519bc00&t=1f53f049-9d56-5ab2-8515-95f5f52811a1

iyastreb avatar Dec 05 '23 07:12 iyastreb