ucx
ucx copied to clipboard
Request: support for RDMA write with immediate data
Is there a reason why UCX does not support IBV_WR_RDMA_WRITE_WITH_IMM? Greping the source code did not find any hits.
The current UCP high-level API we defined (modeled after openshmem) does not have a use case for WRITE_WITH_IMM. Do you see a case where it may be useful?
We are looking at persistent memory and have to notify the target of completed RDMA writes. The alternative is sending a message, but that seems to have higher latency.
WRITE_WITH_IMM would be more efficient, but should not be a very big difference, can send the message immediately after ucp_ep_put()+ucp_worker_fence(), no need to wait for completion. for the longer term we will probably add UCP API for "signaling puts" that will use WRITE_WITH_IMM
Thanks for your help. I will try the fence.
Part of the problem is the size of IMM data that is defined by IB standard. As far as I can remember, it is 40bit, which not very useful.
40 bits would be sufficient to encode any offset in an 1 TB array, which fits our use-case.
We would have to come up with a generic API that is not limited to 40bits but can leverage underlying HW capacities. Probably user can query (or request) specific IMM size during lib initialization. Based on the value we can adjust the protocol
For reference the equivalent for ugni is 48 bits.
Hi, I have the following questions on RDMA write operations.. I have the following questions on the same.
- What is the difference between RDMA write and RDMA write with immediate?
- Can we process the received data from the buffer using RDMA write method? [Assuming client sends data to server using RDMA write method and I want to process the data from the server side buffer]
- Can we process the received data from the buffer using RDMA write with immediate method?
Please help me to understand above scenario. Thanks.
@abhishek-ml-ai Since it is IB standard related question I believe it will be best to refer to the standard https://www.afs.enea.it/asantoro/V1r1_2_1.Release_12062007.pdf Page (129) is good starting point and you can grep the document for "immediate"
Are there any plans to implement a feature similar to RDMA_WRITE_WITH_IMM?
Such a feature is useful to alert the receiver that the RDMA Write has completed. The target of the RDMA Write (ucp_put) can, for instance, poll on the immediate value to know that the Write operation has completed.
If not RDMA_WRITE_WITH_IMM, is there an other UCP feature that can serve the above purpose?
I have a use case for this feature when implementing MPI Partitioned Point-to-Point Communication in MPI Libraries. See Section IV-A in [1] for details.
I can see that we could extend the ucp_request_param_t to include imm_data then call ucp_put_nbx(...) as normal. However, with the current API, it is unclear how we could read the intimidate data from the work completion on the target side. I am happy to implement this feature myself but I will need some guidance on how we want to expose this to the user.
[1] Yiltan Temucin, Scott Levy, Whit Schonbein, Ryan Grant, and Ahmad Afsahi, "A Dynamic Network-Native MPI Partitioned Aggregation over InfiniBand Verbs", 25th IEEE Cluster 2023, Santa Fe, NM, USA, from Oct 31 - Nov 1, 2023.