ircv3-specifications labeled-response=all for more strict guarantees

Specifies https://github.com/ircv3/ircv3-ideas/issues/70. Lets clients have a more firm guarantee about relying on labeled-responses for all of their response tracking. Basically, if your server advertises labeled-response=all then the "where it's feasible to do so" language doesn't apply to you – you ALWAYS send a labeled response, to all commands. There's been interest in adopting this elsewhere.

The "comma (,) separated list of flags" thing is mostly for safety, I'm sure we won't change it in the future but we've said that before and then had to do so later on~

Jun 15 '20 19:06 DanielOaks

What exactly is the difficulty for clients that this solves? They have to handle servers with no labeled-response at all anyway so I'm struggling to understand the problem.

Jun 15 '20 19:06 jwheare

@jwheare First, if this gains popularity across servers, generalist clients might eventually be able to drop their fallback code for servers without labeled-response=all. Or just disable it, as fallbacks may be less reliable than the real thing (eg. if servers send unexpected numerics).

And labeled-response=all is way more useful than labeled-response alone, as it allows clients to completely disable their fallbacks, as they don't have to worry about some messages missing a label, which means no risk of false positive.

Jun 15 '20 19:06 progval

What exactly is the difficulty for clients that this solves? They have to handle servers with no labeled-response at all anyway so I'm struggling to understand the problem.

without it things like bot frameworks or clients that have heavy automation still need to jump through most of the same hoops in case the server leans on the "feasible" clause and doesn't bother to send a labeled response to something. if they see =all, they can turn that off.

I can also imagine something like a new web client aimed at network operators listing IRCv3 caps as prerequisites.

Jun 15 '20 20:06 edk0

Is this a hypothetical or is there a specific client implementation that is struggling with something concrete?

An example involving irc protocol traffic and code/pseudo would be great.

I don’t really understand the concept of turning off fallbacks. Usually you have several code paths, and one is chosen at runtime. The other code path isn’t “turned off” it’s just not used, but it’s still there.

Jun 15 '20 20:06 jwheare

The other code path isn’t “turned off” it’s just not used, but it’s still there.

This is what I meant. And if it's not used, it means it won't trigger bugs.

And (IMO) the goal is get rid of it in the long term, with new clients not implementing this fallback showing an error early in the connection. (Like @edk0 mentioned, but possibly end-user applications at some point.)

Jun 15 '20 20:06 progval

Eventually I'd like to remove the route_replies module from ZNC, because it's full of heuristics and often needs to be updated when new numerics are discovered in wild.

Jun 15 '20 21:06 DarthGandalf

It would be nice to be able to have a strict guarantee but realistically I don't think its feasibly implementable by existing implementations unless they only support one server (which is an extreme minority of server implementations in the wild).

Typically specifications need implementations before stuff is approved. Do we have actual client and server implementations of this or is it just a theoretical thing that might maybe possibly be usable in the future?

Jun 16 '20 03:06 SadieCat

As I understand it, there are two fairly orthogonal issues here:

Under the current spec, servers can refuse to label a response for any reason (because of the "where it is feasible to do so" language)
S2S timeouts can cause commands to fail

To illustrate (1), it seems to me that a server could maliciously comply with the current spec by advertising labeled-response and then labeling only its ACKs, i.e., simply refusing to label any response with one or more lines.

To solve this problem, I'd suggest the following erratum, deleting the "where it is feasible to do so" phrase entirely, and changing "exactly one" message to "at most one":

For any message received from a client that includes this tag, the server MUST include the same tag and value in any response required from this message. Servers MUST include the tag in at most one logical message.

re. (2), as I understand it, the suggestion is that linked servers advertising labeled-response=all will respond to S2S timeouts or failures by sending a labeled FAIL message, possibly after a timeout. It's not really clear to me that this is useful to clients, as opposed to clients implementing their own timeouts. It seems like if the client is dependent on server responses here either for UX reasons or to do resource release, then the client won't be properly resilient to server bugs.

Jun 16 '20 08:06 slingamn

I agree with @SadieCat that it is hardly feasible to offer a 100% guarantee on a multiserver network. At UnrealIRCd we (can) probably label >99% of traffic, but here you are really saying: we are strict, we offer a 100% guarantee, "no ifs or buts". It is very likely that server coders are going to violate their 100% promise due to unforeseen circumstances or a bug. And IMO violating such a (new) promise is worse than what we do now. It's worse, if at the client side you are going to rely on it 100% and erasing any code paths that deal with fallback/checks.

I also wonder how useful this guarantee is, if for example the server pingfreq is 60, server times out after 120 seconds and you get your FAIL timeout for the labeled-response after 120 seconds.. how useful is that? Did that really solve a problem? Don't you have to handle such timeouts anyway? (possibly way before 120s?)

I also agree with @jwheare that it is far more likely that the client has to support the various cases anyway, such as: a server not supporting labeled-reponse, a server that does not offer the 100% guarantee, you have to deal with timeouts, etc. As long as you have that, there is little to no benefit to 'strict'.

Also, with coding in general, and certainly in the field of security, it is good practice to deal with servers breaking their promise and guarantees. Not doing so is usually what creates bugs, sometimes even security bugs. You always have to be wary about that.

Jun 22 '20 14:06 syzop

It is very likely that server coders are going to violate their 100% promise due to unforeseen circumstances or a bug.

Violating the spec is ok, as long as it's considered a bug and will be fixed. That's true for any specification.

Jun 22 '20 15:06 progval

I agree with @SadieCat that it is hardly feasible to offer a 100% guarantee on a multiserver network.

To be clear, the relevance of "multiserver" here is merely that it increases the implementation complexity of providing the guarantee, not that it makes it conceptually impossible to provide the guarantee, right? Multiserver networks can just implement this guarantee by adding timeouts. It still seems to me that the nuts-and-bolts content of this proposal is to shift responsibility for certain timeouts from the client to the server.

I agree with the rest of your comment that this proposal appears to encourage brittle client code. I'm also concerned about the potential confusion from specifying two subtly different versions of the guarantee (hence the desire to "meet in the middle" and do an unconditional erratum with an intermediate version of the guarantee).

Jun 22 '20 17:06 slingamn

Multiserver networks can just implement this guarantee by adding timeouts.

I'm personally not interested in implementing timeouts because I've seen cases where remote servers (e.g. services) have taken up to 30 seconds to respond previously and imo waiting that long before sending an ACK provides a really shitty user experience.

Jun 22 '20 17:06 SadieCat

It's not 100% clear to me what this proposed amendment would require, or recommend, that you do in that case. But on my reading, it would allow you to send a labeled FAIL after 5 seconds, then suppress the actual response if/when it arrives. (Whereas it would actually disallow you from forwarding the response unlabeled, or sending the response labeled after having already sent a labeled FAIL.)

Jun 22 '20 17:06 slingamn

It doesn't appear that this issue has much consensus but correct me if I'm wrong. Is anyone interested in implementing this or continuing the discussion?

Jul 30 '20 18:07 jwheare

I don’t really understand the concept of turning off fallbacks. Usually you have several code paths, and one is chosen at runtime. The other code path isn’t “turned off” it’s just not used, but it’s still there.

Some of the "fallback" logic in clients could potentially produce false positives, and thus it might be desired to disable fallback logic when we know they shouldn't be needed. That can reduce the room for certain numerics falling into false positive handling when the server has indicated it has a strict labeled response implementation. This may decrease the room for client side bugs in some cases.

In Palaver, the implementation counts for the following when sending a remote WHOIS with labeled response:

a labeled reply with batch comes back, containing the entire whois (this is behaviour we see from Oragono - it doesn't support spanning tree and thus it is always strict).
a labeled reply comes in containing an ACK - in this case we now expect the WHOIS to come without labeled reply afterwards (this is behaviour we see from InspIRCd -- if nick isn't remote then falls to prior above case).
we never get labeled reply - upon reciving any numeric after sending a labeled WHOIS which we expect from WHOIS triggers us to no longer expect for labeled reply for the command and instead collect all those numerics up until WHOIS end (this is behaviour we see from UnrealIRCd).

Originally we had improper handling with InspIRCd as it responsed to WHOIS with ACK which other servers did not and Palaver treated that as the entire WHOIS response thus showing the nick as offline in Palaver and disregarding the following WHOIS numerics. We've now counted for that case too.

With the fallback for the behaviour from UnrealIRCd (and possibly InspIRCd), a timely RPL_AWAY numeric after we are expecting a labeled whois response may interfere with the "strict" implementation for servers like Oragono.

For example, the RPL_AWAY numeric can be presented from other commands such as PRIVMSG, or INVITE etc.

C: INVITE kylef #example
S: :irc-us-east-1.darkscience.net 341 doe kylef :#example
S: :irc-us-east-1.darkscience.net 301 doe kylef :brb

Now let's try sending INVITE and WHOIS together:

C: INVITE kylef #foo
C: @label=1 WHOIS kylef kylef

S: :irc.example.com 341 doe kylef :#foo
S: :irc.example.com 301 doe kylef :brb

S: ...
S: :irc.example.com 301 doe kylef :brb
S: ...
S: :irc.example.com 318 doe kylef :End of /WHOIS list.

The first 301 reply might look like a response to the WHOIS request, and thus client logic might think "we won't be getting a labeled response for this request, and this 301 appears to be from our WHOIS" when counting for fallbacks for the behaviours above. It may be later followed by an actual labeled response in strict servers. This is perhaps an avoidable case when the INVITE message could be sent with label, although without labeled-response being strict a client might not be able to assume it will get a labeled response.

While it is not mandatory, it can reduce the surface for bugs when dealing with strict servers as we can prevent potentially unreliable fallback logic from triggering. Having a way for a server to indicate it has strict labeled response makes a lot of sense to me. Some clients may even desire to only support strict labeled response to reduce the complexity and test surface of their client.

Having a particular response to a labeled request to indicate that the response may be coming without a label later would be a better situation than we have now too. For example:

C: @label=1 WHOIS kylef kylef
S: @label=1 a response message which indicates the server has received the request and we are not going to give you a response with a label for it, so you may expect a response later

Some servers may support strict in cases where there isn't a linked server, or bouncers depending on the upstream servers capability. Thus it should be possibly for a server to indicate to clients when that changes with cap-notify or similiar. Would a change in the parameters while cap-notify is enabled send a cap DEL/NEW command and require the client to re-enable the capability?

S: CAP * LS :labeled-response=all
C: CAP REQ :labeled-response
S: CAP * ACK :labeled-response

...

// server no longer supports strict labeled-response
S: CAP <nick> DEL :labeled-response
C: CAP REQ :labeled-response
S: CAP * ACK :labeled-response

For the case where cap-notify is not supported or enabled, the guarentee might prove difficult to support if a client was advertised strict labeled-response which is no longer possible.

Aug 02 '20 14:08 kylef

I'm personally not interested in implementing timeouts because I've seen cases where remote servers (e.g. services) have taken up to 30 seconds to respond previously and imo waiting that long before sending an ACK provides a really shitty user experience.

Regardless of whether there is an ACK or not before the 30 seconds, the experiance for the end user is the same. They sent a WHOIS and are waiting.

Aug 02 '20 14:08 kylef

Thanks, this was illuminating.

I'm still not really sure that a specification change is the appropriate solution to this problem. For example, it would still be possible to maliciously comply with labeled-response=all without really improving client compatibility: a server could hypothetically advertise labeled-response=all, then respond to labeled remote WHOIS by sending a labeled ACK and then suppressing the actual reply. (Since this behavior is correct/expected under network partition, it cannot be disallowed by the spec.)

It seems to me that the problem isn't the weakness of the spec, it's that major server implementations fail to label a significant class of responses in the typical case, necessitating client-side fallbacks. If the typical-case behavior was to label the response, then clients could just wait for a labeled response, then time out and report an error to the user in the atypical case where they don't get one (and ignore any unlabeled response).

With the fallback for the behaviour from UnrealIRCd (and possibly InspIRCd), a timely RPL_AWAY numeric after we are expecting a labeled whois response may interfere with the "strict" implementation for servers like Oragono.

Seems like assuming that WHOIS responses MUST start with a 311 RPL_WHOISUSER would make for more robust behavior? Are there any major servers that don't conform to this?

Aug 02 '20 19:08 slingamn

Obviously servers can cause client headaches by maliciously complying. They can also cause client headaches by just advertising =all and not meeting the requirement. The point of the spec is to help non-malicious server implementations help clients.

Anyway, I intend to implement this extension along with labeled-response; spending time trying to convince people of its value isn't very interesting to me. It would be nice if it could be standardised so that I know it won't clash with any future uses of the labeled-response cap value.

Aug 02 '20 20:08 edk0

ircv3-specifications ircv3-specifications copied to clipboard

labeled-response=all for more strict guarantees

ircv3-specifications
ircv3-specifications copied to clipboard