multipath icon indicating copy to clipboard operation
multipath copied to clipboard

Review error codes

Open mirjak opened this issue 2 years ago • 12 comments

Currently, MP_PROTOCOL_VIOLATION is used is for a too high CID seq num in the frame as well as the handshake error case if multipath is negotiated but no CID is present.

However, we could also decided to define more specific error codes for one or both cases.

mirjak avatar Jul 05 '23 16:07 mirjak

handshake error case if multipath is negotiated but no CID is present

I think we use TRANSPORT_PARAMATER error? See https://quicwg.org/multipath/draft-ietf-quic-multipath.html#section-3-6. FWIW, my understanding is that use of such TRANSPORT_PARAMETER error is in line with us using TRANSPORT_PARAMETER error to indicate that zero CID was used in conjunction with the preferred_address Transport Parameter.

Regarding if we need more errors, I do not think there is a need at the moment. All the use of MP_PROTOCOL_VIOLATION is related to semantic errors in the newly added frames, we have Frame Type and Reason Phrase in the CONNECTION_CLOSE frame to communicate the problem in detail.

Honestly, I would not mind getting rid of MP_PROTOCOL_VIOLATION altogether considering that i) we have the Frame Type field to communicate which frame had a problem, and ii) for errors in the Connection ID sequence numbers being communicated inside RETIRE_CONNECTION_ID frame we use PROTOCOL_VIOLATION.

kazuho avatar Jul 06 '23 01:07 kazuho

We changed the handshake error to MP_PROTOCOL_VIOLATION in the editors copy yesterday based on the discussion at 116 as recorded in issue #157.

However, I think we need a more high-level discussion about all error codes, there I opened this issue for more discussion at 117.

mirjak avatar Jul 06 '23 09:07 mirjak

@mirjak Oh I missed that, thank you.

Regarding if we should have more error codes, my two cents goes to not, as we can use the Frame Type field of the CONNECTION_CLOSE frame to figure out the cause.

kazuho avatar Jul 06 '23 10:07 kazuho

Discussed in IETF 117, single MP-specific error code seems sufficient

LPardue avatar Jul 25 '23 16:07 LPardue

@LPardue Actually, there was s slightly different conclusion. The current error happens during the handshake, possibly before the MP option was negotiated. Implementation that are not multipath capable will just send "PROTOCOL_VIOLATION" and document the frame type. That might be good enough for us too, and we may be able to get rid of the multipath specific error code.

huitema avatar Jul 25 '23 17:07 huitema

There are currently 4 case where we send an error code:

  1. multipath is negotiated but there is no CID(s) -> MP_PROTOCOL_VIOLATION
  2. Cipher suites with too short nonce -> TRANSPORT_PARAMETER
  3. multipath frame within wrong packet type -> FRAME_ENCODING_ERROR
  4. seq number of CID (used in a frame) was is not valid -> MP_PROTOCOL_VIOLATION

@Lucas which error are you talking about?

mirjak avatar Jul 25 '23 18:07 mirjak

@LPardue @huitema not sure what we need to do here. Any more input?

mirjak avatar Oct 21 '23 12:10 mirjak

I don't think we need to change the spec now.

huitema avatar Oct 22 '23 00:10 huitema

Can we close this issue? Are we happy with the current use of error codes?

mirjak avatar Nov 03 '23 14:11 mirjak

What's the justification for having a dedicated MP_PROTOCOL_VIOLATION error? All other QUIC extensions so far have been using the standard PROTOCOL_VIOLATION error, instead of defining their own bespoke version.

marten-seemann avatar Nov 03 '23 15:11 marten-seemann

My take on the design philosophy of error codes in QUIC was a) an enpoint can always pick the most generic error code b) picking a more-specific error code is a kindness/affordance to the peer. An error code might help the receiver determine more quickly where a problem lies - but quite often these bugs are rare and easy to determine a root cause as long as awareness of any issue is raised (e.g. I got a CONNECTION_CLOSE when hitting a server, and here is the repro URL)

This stuff is a bit subjective.

I see 2 cases in the draft where MP_PROTOCOL_VIOLATION is used

  1. If an enable_multipath transport parameter is received and the carrying packet contains a zero length connection ID, the receiver MUST treat this as a connection error of type MP_PROTOCOL_VIOLATION and close the connection.

This is super easy to spot - its a critical error exceedingly early in a connection and its clear why the sender triggered it

  1. All multipath-specific frames relate to a Destination Connection ID sequence number. If an endpoint receives a Destination Connection ID sequence number greater than any previously sent to the peer, it MUST treat this as a connection error of type MP_PROTOCOL_VIOLATION.

This one doesn't seem as easy as the first. However, the endpoint that receives the bogus frame can just indicate the frame type in the CONNECTION_CLOSE. It might also say something in the reason phrase.

On balance, it feels to me that dropping MP_PROTOCOL_VIOLATION would not make anyone's lives too hard, while saving the overhead of defining and implementing the error code.

LPardue avatar Nov 03 '23 16:11 LPardue

Having reviewed the document now again, I also think we should drop MP_PROTOCOL_VIOLATION and use PROTOCOL_VIOLATION instead.

If we really want a more specific error code for the zero-cid case, I think we should define an error code for that specifically. Additionally, we could also define a path_id_limit error (now with the explicit path id) if people think that would be useful...?

mirjak avatar May 27 '24 21:05 mirjak

As discussed at IETF-120, PR #428 removes the MP_PROTOCOL_VIOLATION error code.

mirjak avatar Sep 02 '24 15:09 mirjak