Allen
Allen
> > Yes, that is expected. We don't configure mlx5_1 and mlx5_3. > > Seems like the test being run on mlx5_1? Per the command above: > > > `UCX_PROTO_ENABLE=y...
@yosefe Sorry, it looks like a network failure. I'll look into it myself first. > Can you try: `ping -I ens7f0np0 10.16.39.12` on node13? ``` [root@localhost network-scripts]# ping -I ens7f0np0...
@yosefe When I configure UCX_NET_DEVICES=mlx5_0:1,mlx5_2:1 is already working. But only 2*100Gbps bandwidth. However, in my other environment(4*25Gbps), without limiting UCX_NET_DEVICES=mlx5_0:1,mlx5_2:1 can also work properly, and can get 4*25Gbs bandwidth.
> [@ivanallen](https://github.com/ivanallen) what is the network speed of each NIC (can be checked by ibstat or ibv_devinfo)? Does the other environment have more configured NICs? Hi @yosefe, Can we look...
> > Sorry for another question. What is purpose of the memh in ucp_request_param_t. It says is a pre-registered buffer. But what if I have already a pointer to ucp_am_recv_data_nbx,...
> For iov data, the memh optimization parameter can be used only if all iov buffers are within the memh region. @yosefe Does this include am header buffer?
> Are you using multiple threads? Can you pls check if setting `UCX_IB_MMIO_MODE=db_lock` fixes the issue? If it does you need to initialize UCP worker with threading support Hi @brminich...
@brminich > but is your client and/or server using multiple threads? Not for polling worker, but for something else? If yes, pls try to initialize UCP worker mode as UCS_THREAD_MODE_SERIALIZED...
@brminich I found some clues. When I only pass a piece of data (using continuous buffer), only the first 8 bytes of data are corrupted. ``` -- client log (server...
I tried several times and found that the data that was corrupted had certain patterns. In the same process, the corrupted data is the same, and a new value is...