ucx
ucx copied to clipboard
ucp_client_server client not stop
Describe the bug
when using message size. -s . above 20. the client never ends run server got fin message, but client not return
Steps to Reproduce
- Command line ucp_client_server -s 100 -a x.x.x.x
- UCX version used (from github branch XX or release YY) + UCX configure flags (can be checked by
ucx_info -v
) - Any UCX environment variables used
- 1.15 ./contrib/configure-release --prefix=$PWD/install
Setup and versions
- OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)
-
cat /etc/issue
orcat /etc/redhat-release
+uname -a
- For Nvidia Bluefield SmartNIC include
cat /etc/mlnx-release
(the string identifies software and firmware setup) - 1.15 ./contrib/configure-release --prefix=$PWD/install
- Linux matrix-load-load2-instance 5.4.0-1109-azure #115~18.04.1-Ubuntu SMP Mon May 22 20:06:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
-
Tested successfully master and v1.15 with:
- client:
./examples/ucp_client_server -s 100 -a X.X.X.X
- server:
./examples/ucp_client_server -s 100
Also tested with added -c am
on both side. Do you also have the -s
parameter on the server side?
I didnt add -s on both sides.
is the server reply back to client the same message?
yes once, so -s
parameter is needed also on server side.
Thanks
when i add -c am. client and server ends with a segmentation fault ./ucp_client_server -a localhost -i 10000 -s 1000 -c am ./ucp_client_server -s 1000 -c am -s 1000 client: matrix-load-load2-instance:18990:0:18990] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10) ==== backtrace (tid: 18990) ==== 0 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucs.so.0(ucs_handle_error+0x144) [0x7fa625f9e714] 1 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucs.so.0(+0x30a8c) [0x7fa625f9ea8c] 2 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucs.so.0(+0x30d04) [0x7fa625f9ed04] 3 /lib/x86_64-linux-gnu/libc.so.6(+0x3ef10) [0x7fa625bbbf10] 4 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucp.so.0(ucp_am_handler+0x5a) [0x7fa6262037fa] 5 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libuct.so.0(+0x22848) [0x7fa625962848] 6 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libuct.so.0(+0x24f78) [0x7fa625964f78] 7 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucs.so.0(ucs_event_set_wait+0xb3) [0x7fa625fa97b3] 8 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libuct.so.0(uct_tcp_iface_progress+0x7b) [0x7fa62596501b] 9 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucp.so.0(ucp_worker_progress+0x22) [0x7fa626220eb2] 10 ./ucp_client_server(+0x2702) [0x56426282a702] 11 ./ucp_client_server(+0x276a) [0x56426282a76a] 12 ./ucp_client_server(+0x2f9a) [0x56426282af9a] 13 ./ucp_client_server(+0x35d4) [0x56426282b5d4] 14 ./ucp_client_server(+0x3bea) [0x56426282bbea] 15 ./ucp_client_server(+0x3f35) [0x56426282bf35] 16 ./ucp_client_server(+0x4141) [0x56426282c141] 17 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7fa625b9ec87] 18 ./ucp_client_server(+0x170a) [0x56426282970a]
Segmentation fault (core dumped)
server: Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil)) ==== backtrace (tid: 18987) ==== 0 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucs.so.0(ucs_handle_error+0x144) [0x7f2c63bba714] 1 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucs.so.0(+0x30a8c) [0x7f2c63bbaa8c] 2 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucs.so.0(+0x30d04) [0x7f2c63bbad04] 3 /lib/x86_64-linux-gnu/libc.so.6(+0x3ef10) [0x7f2c637d7f10] 4 /lib/x86_64-linux-gnu/libc.so.6(+0x18eb00) [0x7f2c63927b00] 5 ./ucp_client_server(+0x18c0) [0x55d17b4cc8c0] 6 ./ucp_client_server(+0x2d8d) [0x55d17b4cdd8d] 7 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucp.so.0(ucp_am_handler+0x199) [0x7f2c63e1f939] 8 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libuct.so.0(+0x22848) [0x7f2c6357e848] 9 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libuct.so.0(+0x24f78) [0x7f2c63580f78] 10 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucs.so.0(ucs_event_set_wait+0xb3) [0x7f2c63bc57b3] 11 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libuct.so.0(uct_tcp_iface_progress+0x7b) [0x7f2c6358101b] 12 /home/ubuntu/yabadi/ucx/ucx-1.15.0/install/lib/libucp.so.0(ucp_worker_progress+0x22) [0x7f2c63e3ceb2] 13 ./ucp_client_server(+0x3d0a) [0x55d17b4ced0a] 14 ./ucp_client_server(+0x3e22) [0x55d17b4cee22] 15 ./ucp_client_server(+0x4126) [0x55d17b4cf126] 16 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f2c637bac87] 17 ./ucp_client_server(+0x170a) [0x55d17b4cc70a]
Segmentation fault (core dumped)
Tried similar commands below and did not see any repro. Could you please try a later version?
./examples/ucp_client_server -c am -i 10000 -s 10000
./examples/ucp_client_server -c am -i 10000 -s 10000 -a x.x.x.x
Thanks
when i use ./examples/ucp_client_server -s 1000 -i 100 -c am ./examples/ucp_client_server -s 1000 -i 100-c am -a localhost
the test seem to ends, no more prints but client not return both server and client 99% CPU I compile v.16x devel
how could I configured the have Rendezvous, server will reply to client , (ping-pong)
the test seem to ends, no more prints but client not return both server and client 99% CPU I compile v.16x devel
that should be fixed by #9701
merged #9701