multipath icon indicating copy to clipboard operation
multipath copied to clipboard

What to do on path time-out?

Open mirjak opened this issue 1 year ago • 9 comments

With use of multipath you have a multiple path open but not use all of them concurrently (e.g. for stand-by). Therefore a path time out makes less sense. We probably need to consider if the path if actively used or not. Only you send on a path and all packets are finally marked as lost, that's probably a good indication that the path Is broken and you should close it. If a path is not used, you still might want to keep it open to use it later. Not sure if requirement to send pings on all non-used path is that useful.

Another related issue is that if you only send non-ack eliciting packets on a path (like ACKs) and the path is not used by the peer for sending, you might not be able to detect a path breakage. May it would make sense to require sending ack electing frames on all used paths from time to time?

mirjak avatar Jul 01 '24 16:07 mirjak

The timeout issue was already addressed in PR #377, see section "Idle Timeout". The main drawback of the current text is that it forces some keep alive traffic for paths kept in standby. I think that's a reasonable compromise: if a path is never used for a very long time, there is no guarantee that it will still be available when the endpoints decide to to use it.

huitema avatar Jul 02 '24 19:07 huitema

That's exactly the point that I would like to discuss further. Currently the text in the "idle timeout" section says:

Hosts SHOULD stop sending traffic on a path if for at least the period of the idle timeout.

Note that PR #377 did not change this text. However, I think that the assumption that if a path is not used for a while it is automatically broken is not useful, especially if you keep a path open as a standby on purpose.

The question if you want send keep-alive traffic on such a path is independent for me because this addresses rather the question when to detect a path failure. I.e. I think there is nothing wrong to keep a path open without using it, then if your actively-used path fails, you switch over to that path. If that is then also not working, you can try to establish another path or close the connection. If you think it's likely that your standby path will break, of course it can make sense to actively probe liveliness to avoid delays when you later actually try to use that path. However, I don't think we should require a path to be closed after an idle time or sending of potentially unnecessary probing traffic.

mirjak avatar Jul 03 '24 14:07 mirjak

I think that we agree. The text in the "idle timeout" section says two things:

  • specify an idle timeout behavior similar to what was in previous drafts

  • require that if a path is abandoned because of idle timeout, the endpoints must explicitly abandon it.

The second point is sound. We could say the first one differently, and treat the decision to abandon paths after timeout as a local behavior. But if an endpoint is going to use an idle timer it is better to tell it to the peer, because the peer may want to send keep-alive traffic to avoid the closure, and it needs to know the path timeout to parameterize the keep-alive traffic.

huitema avatar Jul 03 '24 19:07 huitema

So, yes I propose to only change the first point because there is no need to enforce anything here in the multipath case. If you only have a single path and it breaks, you have to close it at some pre-agreed time because you can't communicate something else to the other peer anymore. However, in the multipath case as long as you still have at least one working path, there is actual no really good reason to force the close by an timeout because any peer can at any time gratefully close the path by sending an abandon frame on another path.

This can be useful if e.g. an interface goes down but comes back after a short time. Of course you have to remember that a path is currently not working and somehow recognise when it's up again (by probing it?). But not sure we need to specify much here.

Alternatively, I think it would probably make sense to have different timeout for the connection and for (sub-)paths and we could revisit that question. But not enforcing anything and just leaving it as local decision is the easiest.

On your text above: If the peer keeps sending keep-alive traffic and the endpoint receiving and acknowledging it, then there should never be a time-out no?

Maybe there are cases where an endpoint could decide to close a path that is only used for keep-alive traffic but that's a different case that we maybe should discuss separately (in the implementation considerations section?).

mirjak avatar Jul 04 '24 14:07 mirjak

The point of the idle timeout was so that an endpoint need not wake its radio to send closing packets -- it can just let the connection fade away. I think that's an important property we shouldn't lose when multipath is negotiated.

If an endpoint wanted to maintain the connection despite having nothing to send, keep-alive traffic is recommended. The idle timeout informs the peer of the minimum frequency at which it needs to send keep-alive traffic if it wishes to do so.

So the scenarios I think we care about for multipath:

  • If I want to keep a standby path alive, how frequently should I send keep-alive traffic?
  • When I go to send data, how are timed-out paths considered?
  • If I decide to let the entire multipath connection idle-close, at what point does that happen?

The normative text in #389 is unnecessary; endpoints don't need permission to close a path based on idle timeout, since they can explicitly close a path at any time for any reason. The existing statement that this reason still requires a PATH_ABANDON is sound.

Rather, I think what we need is guidance on the most productive behavior. I think that would be:

  • On any path you want to keep active, SHOULD send keep-alives based on the indicated timeouts.
  • If you don't do keep-alives, you SHOULD probe the path before using it unless it's the only remaining path.
  • If you are about to send traffic, you SHOULD send on a path that has not timed out.
  • Idle timeout is just like any other reason for closing a path; you MUST send a PATH_ABANDON on another path
    • ...though perhaps not immediately, to avoid waking radios. Bundle in the next packet on any path?
  • The connection is closed if all paths have been idle for longer than the idle timeout.

MikeBishop avatar Jul 17 '24 21:07 MikeBishop

@MikeBishop it took me a bit of time to understand what "normative text in #389" you were speaking about, but I believe you are speaking of:

When more than one path is available, hosts shall monitor the arrival of
packets and acknowledgements for packets sent over each
path. Hosts MAY consider closing a path if for at least the period of the
idle timeout as specified in {{Section 10.1. of QUIC-TRANSPORT}}
(a) no packet was received or (b) no packet sent over this path was acknowledged.
Endpoints that desire to close a path because of the idle timer rule
rule if it would disqualify all available paths.
MUST do so explicitly by sending a PATH_ABANDON frame on another active path,
as defined in {{path-close}}.

There are two normative statements there, MAY (close a path), and MUST (send an explicit PATH ABANDON). There is also a pseudo normative one, shall monitor. I think that the paragraph does many things:

  • specify that endpoints shall monitor the paths. I am not sure about that. No interop issue arises if the endpoint does not.
  • say that endpoints MAY close a path because of timeouts,
  • remove the distinction between probing packets and non probing packets.
  • specify that if endpoints want to close a path upon timeout, they should use the same "idle timeout" defined for the connection, which implicitly tells endpoints "how frequently they should send keep-alive traffic".
  • specify that endpoints MUST send PATH ABANDON frames if they close because of timeout, which is kind of redundant but clearly specify that we not use the "silent close" process of RFC 9000.

I agree that the MAY close is a bit redundant, but I think it is important to have it to introduce the discussion of the other points. I would personally suggest striking out the first sentence of the paragraph, "shall monitor". And maybe belabor the "same timeout" point a bit more, because that does have interop consequences.

huitema avatar Jul 18 '24 14:07 huitema

@MikeBishop sending the PATh_ABANDON does not need to wake up the radio because you send it on another (active) path. However, sending keep-alive requires to wake up the radio every time and therefore is exactly the thing I want to avoid.

Without multipath if there is only one actively path, you need to define a clear point of time, so both endpoint close down if the path doesn't work anymore. that is not needed with multipath because you always send an explicit signal (PATH_ABANDON) on another path. With PR #377 and some other changes we are much more clear now in the draft that's always better to send a explicit signal and there is basically no reason to do an implicit close (except for the last path and therefore the whole connection).

Of course if you have an open path and you don't send keep-alives the path can be broken we you try to use it again, however, I think it depends on your local knowledge if you think sending keep-alives will help or is just a waste of resources. So there is no reason to have anything in the protocol itself that requires one or the other behavior.

mirjak avatar Jul 20 '24 19:07 mirjak

I think we're bringing a few different assumptions to this conversation, which is helpful. (And of course, if my assumptions are wrong, I'll happily learn better.)

I'm assuming that sending the PATH_ABANDON requires waking up some radio. That's why I think we should explicitly allow sending it to be deferred until you're sending a packet anyway; if the other path is actively being used, that's still immediate. If it's not, you avoid waking up any idle radios.

I also think we're in agreement about keep-alives -- base QUIC recommends sending keep-alives if you actively want to maintain the connection, but leaves it to the application to decide when that is. HTTP/3 gives some recommendations about when that might be, but still leaves it as an implementation decision. Nothing needs to require keeping any specific path alive, but guidance about when it might be useful (or not) seems appropriate.

An endpoint might need to send ack-eliciting packets to avoid an idle timeout if it is expecting response data but does not have or is unable to send application data.

HTTP clients are expected to request that the transport keep connections open while there are responses outstanding for requests or server pushes

MikeBishop avatar Jul 22 '24 00:07 MikeBishop

This was discussed at IETF-120. Main take-away (as commented by MT) is probably that the idle time out is a connection level signal and we should not overload it for path management. However, we should still provide guidance about how often to send pings but this might more depend on the local network knowledge than connection state. Likely we can remove the normative language here or the above-cited sentence entirely. However, we probably need a slightly bigger PR to explain this correctly.

mirjak avatar Sep 03 '24 10:09 mirjak