UCT/IB/UD: Fix connection matching
What
Fix connection matching in UD transport.
Why ?
If user destroyed UCT EP, we shouldn't disconenct it, because it could be used for RX operations.
How ?
- Introduce TX and RX flags.
- If both TX and RX flags are set, endpoint shouldn't be added to CEP.
- If EP has TX-only - add to expected connection matching queue, if has RX-only - add to unexpected one.
- Send
FINpacket if closing won't use TX capability anymore on EP. - Handle
FINpacket to remove RX capability on a EP-receiver.
@brminich @hoopoepg could you review pls?
@yosefe @brminich could you review pls?
@brminich @yosefe could you review pls?
in general, maybe we don't need these fixes for keepalive? I would expect that when using CM, ud ep will be connected in p2p mode, so it will not use conn_match flow
we need it for CONNECT_TO_WORKER mode anyway.
btw, it was bug in Keepalive mode selection (fixed by #8113), we used AM-based keepalive - so we created reply EP to send EP_REMOVED with CONNECT_TO_WORKER mode. I'm testing #8113 to check whether it is enough to fix "ACKed and TX PSNs mismatch".
in general, maybe we don't need these fixes for keepalive? I would expect that when using CM, ud ep will be connected in p2p mode, so it will not use conn_match flow
we need it for CONNECT_TO_WORKER mode anyway. btw, it was bug in Keepalive mode selection (fixed by #8113), we used AM-based keepalive - so we created reply EP to send
EP_REMOVEDwithCONNECT_TO_WORKERmode. I'm testing #8113 to check whether it is enough to fix "ACKed and TX PSNs mismatch".
Can we fix the p2p keepalive prior to fixing CONNECT_TO_WORKER mode, to make io_demo pass?
in general, maybe we don't need these fixes for keepalive? I would expect that when using CM, ud ep will be connected in p2p mode, so it will not use conn_match flow
we need it for CONNECT_TO_WORKER mode anyway. btw, it was bug in Keepalive mode selection (fixed by #8113), we used AM-based keepalive - so we created reply EP to send
EP_REMOVEDwithCONNECT_TO_WORKERmode. I'm testing #8113 to check whether it is enough to fix "ACKed and TX PSNs mismatch".Can we fix the p2p keepalive prior to fixing CONNECT_TO_WORKER mode, to make io_demo pass?
@yosefe @brminich as discussed offline, we need to add dropping packets if endpoint was disconnected, but we should fix connection matching to implement it
In general, conn_match seems not really neeeded for keepalive with CM. We can force packet drop on a disconnected ud endpoint only on p2p-connected endpoints, so it will not affect ep-to-iface scenario.
do you mean checking DISCONNECTED flag for CONNECT_TO_EP endpoints only in 1.13? it will have extra branch in uct_ud_ep_process_rx
do you mean checking DISCONNECTED flag for CONNECT_TO_EP endpoints only in 1.13? it will have extra branch in
uct_ud_ep_process_rx
Yes: Let's start with a fix to check DISCONNECTED only for connect-to-ep. Maybe we can unite it with the dest_id check, by changing ep->dest_id of a disconnected p2p endpoint to an invalid value (instead of setting a DISCONNECTED flag). Make sure it solves the timeouts issue and port it to v1.13.x I think the full connecting matching it's a complex change in a "sensitive" place that may break other use cases, so better postpone it for now. For example maybe need to move this logic to another layer, make sure that we don't send more wireup packets now, etc.
Let's start with a fix to check DISCONNECTED only for connect-to-ep. Maybe we can unite it with the dest_id check, by changing ep->dest_id of a disconnected p2p endpoint to an invalid value (instead of setting a DISCONNECTED flag). Make sure it solves the timeouts issue and port it to v1.13.x
submitted #8151 which checks CONNECT_TO_EP | DISCONNECT flags to make a decision about dropping RX packet.
currently, testing it with IO-demo reproducer of "timeout waiting for N replies".