multipath Abandon Frame needs Error Code definition

Section 7.2: Should Error Code reference the number space defined by transport or the IANA Registry?

Jul 10 '24 14:07 gloinul

Actually I find that the current draft lack of the definition for path abandon close error code. The error code registered in IANA Registry is used for Connection Close Frame. We definitely need to add this part.

Jul 11 '24 13:07 Yanmei-Liu

Do we really want to replicate the structure of connection close, with 2 frame types, one "abandon by application" and the other "abandon by transport"? Or do we want to follow the example of "Stream Reset" and "Stop Sending", which just carry an application defined code?

I am for "application defined", because I think there are only two cases in which the transport originates an Abandon packet "by itself": abandon on timer, and abandon in response to an abandon by the peer. Using error code=0 for these two cases seems like a fine simplification.

Jul 11 '24 17:07 huitema

This was discussion in the session at IETF-120. Decision was to remove the Reason Phrase and define a MP specific error code space, e.g. including error codes like "closed by application", "out of ressources", ...

Sep 02 '24 15:09 mirjak

I am trying to address this issue with PR #445

Oct 03 '24 09:10 yfmascgy

Thanks for starting the PR @yfmascgy. In the discussion @marten-seemann proposed to define a separate error code space rather than using the existing QUIC error code space. Do people have opinions on that?

Oct 07 '24 13:10 mirjak

Also more more question: Should the PR also add information which error code to use in each case where we require/recommend to send a path abandon frame. E.g. if close a path because of a local timeout, you should probably use "NO_ERROR". If you close a path that the other end tries to open, you maybe use "NO_ERROR" or maybe also something more specific like "RESOURCE_LIMIT_REACHED" which indicates to the other end that it should not try to open this path (or any other path) right now again...?

Oct 07 '24 13:10 mirjak

I think we should keep using the QUIC code space. That was pretty much the recommendation during the WG meeting.

Oct 15 '24 06:10 huitema

Reposting here for the discussion what I just posted in the PR: I think using the QUIC error code space is wrong. I think we should not call this error code at all. In case of connection close, you actually close the connection because there was an unexpected protocol behavior. In case of the multipath extension, it is the same: if there is something unexpected, you have to close the whole connection. Therefore closing a path is never an error but always an explicit decision.

Oct 17 '24 10:10 mirjak

I rewatched part of the session and want to note a couple of points here.

First the reason for removing the reason phrase is that processing that (variable-length) string might create quite some overhead. You can just not implement that for connection close as you'll close the connection anyway but for other frames that's probably something to avoid (and use error codes instead). However, note that if we remove the reason phrase we don't have any way anymore to communication any reason from the application why it decided to close a path.

Further, if we have an error code we could probably also add another error code for "unstable_interface" or something like this. Thinking more about this we could probably also add another code for "no_CID_available" (even thought not sure here because that might actually be a real error to close the connection?). These ones could actually be useful for the receiver to decide if it tries to re-open the path or not.

For me right now, the two possible ways forward are:

remove the error code (and reason phrase?)
define a new QUIC code space

Just for the sake of completeness the other two options discussed in the last session were: 3) add additional QUIC error codes (and restrict use in the abandon frame to a limited set?) 4) add an application layer error code instead

Oct 18 '24 15:10 mirjak

As a reference, HTTP/3's error code space contains codes that cover various scenarios.

This include unexpected protocol errors, such as detecting the peer closed a stream it shouldn't and responding with a CONNECTION_CLOSE of H3_CLOSED_CRITICAL_STREAM. But it also includes expected behaviours that are not errors, such as when an endpoint cancels a request (a bit like path abandon) by sending a REST_STREAM or STOP_SENDING using H3_REQUEST_CANCELLED.

In all these cases, a compliant H3 implementation abides to these requirements:

Although the reasons for closing streams and connections are called "errors", these actions do not necessarily indicate a problem with the connection or either implementation. For example, a stream can be reset if the requested resource is no longer needed.

An endpoint MAY choose to treat a stream error as a connection error under certain circumstances, closing the entire connection in response to a condition on a single stream. Implementations need to consider the impact on outstanding requests before making this choice.

Because new error codes can be defined without negotiation (see Section 9), use of an error code in an unexpected context or receipt of an unknown error code MUST be treated as equivalent to H3_NO_ERROR. However, closing a stream can have other effects regardless of the error code; for example, see Section 4.1.

As a recipent, an H3 implementation has to be prepated for a peer to use "the wrong" error code space (send a transport error when it was an application error), or send a generic code, or send an incorrect code. I don't think any of those cause any real problems for the recipient, it's mainly for debugging. And someone investigating these should always be skeptical of the peer not telling the ground truth.

On balance, I think having some error code is useful and its fine to use the QUIC error code space. This is an example of an extension where we are building some more complexity into the transport and can expect to need to extend the shared codepoint space to suit.

Having the frame use values in a unique error code space is also a possibility. But then you'll want to create a separate registry for that. And it doesn't stop the peer sending any allowable value on the wire.

add an application layer error code instead

I not 100% sure what this means. If the suggestion is to use e.g. HTTP/3 application error space I don't think that works. If its more like "allow applications to define their own means to communicate the reason a path was abandoned, such as a new application-layer message" then maybe but it seems like a pain when we can solve a problem transport in transport.

Oct 18 '24 15:10 LPardue

Just for the record on my point 4 above about application error codes. This was discussed at the last meeting. The proposal was to use application error code similar as in RESET_STREAM but also to reserve code 0 for QUIC layer path abandon. Especially the last part (reserving 0) was rejected, so if we would want to do that, we would need to define two frame types instead. Further it was argued that there usually is no reason for the application to close a path (other then with stream reset). I'm not sure about this argument: I would assume opening a new path is usually trigger by the application (in case of H3, I don't think it would be the H3 layer but the application on top using H3) and as such I'd assume closing a path might also be trigger by the application. It was further made the argument that having such an application interaction means you can't use multipath without changing the application. I believe we want to support are both cases where you can have logic in the QUIC stack to manage paths but also where the application maybe wants tighter control. So at the moment, I'm still not fully convinced that an application error code would not be useful.

Oct 21 '24 12:10 mirjak

However, if we need an application-specific abandon frame (with a different frame type anyway) a future extension could add that frame later. As such it seems that we have sufficient agreement to merge PR #445 for now.

I think the only two open questions might be rather on the error codes itself:

PR #445 proposes an error code for APPLICATION_ABANDON and it was questioned if that is really needed or if it is sufficient to use NO_ERROR instead. I guess if debugging is the main reason to have this error code at all, this could be useful? At least the argument was made that error codes are cheap and therefore it looks like we should keep it.
During the meeting another case was proposed for problems with the interface. Should we add a separate error code for INSTABLE_INTERFACE? It was also mentioned in the session at IETF-120 that this error code could actually be useful to indicate to the peer to not retry. So I guess this goes beyond debugging then. Maybe we merge PR #445 for now as proposed without this error code but continue the discussion...?

So in summary it looks like PR #445 should be merged now but we can keep this issue open for further discussion.

Oct 21 '24 12:10 mirjak

We merged #445 but leave this issue open for further discussion. Please provide input on the needed error codes!

Oct 21 '24 16:10 mirjak

Why would the peer be interested in and need to know the reason that the path is abandoned? There is no reason/"error" signaled when new paths are set up.

The only piece of potentially relevant information would be if the path is closed because of resource limits, which would imply that the peer (client in practice) should not try to immediately set up a new path. But how long would it have to wait before it tries again?

Oct 30 '24 08:10 michael-eriksson

If the PATH_ABANDON Error Code field remains, the error codes should definitely be in another number space than the QUIC transport error codes, The two use cases are semantically very different and most of the error codes only make sense in one of the cases.

Oct 30 '24 08:10 michael-eriksson