rc_verbs_ep_fence Cannot fence RDMA_WRITE after RDMA_READ
Problem: Out-of-Order Execution of Requests in RNIC
According to Table 79 and Table 80 in the IB Specification and link, it is possible that a later RDMA WRITE request completes faster than its preceding RDMA READ. This happens a lot as I tested with libibverbs. This problem can be solved by adding IBV_SEND_FENCE flag to the later RDMA WRITE request.

Impact
As I read the code of uct_ep_fence, I find that currently, the IBV_SEND_FENCE is only added to RDMA READ operations, which means the later RDMA WRITE can still be completed prior to it. To be more specific, If a get_bcopy followed by a put_bcopy to the same address is called, then the former get_bcopy may get the value of the later put_zcopy. I believe this implementation contradicts the description of uct_ep_fence.
To be specific, no fence will be added to the latter uct_ep_put_bcopy under rc_verbs. This means that the former uct_ep_get_bcopy may get the value written by the latter operation.
This issue is related to this pull request.
@bernardshen
This happens a lot as I tested with libibverbs.
Do you have example code based on libverbs to show it really happens?