matrix-spec-proposals icon indicating copy to clipboard operation
matrix-spec-proposals copied to clipboard

MSC2162: Signaling Errors at Bridges

Open V02460 opened this issue 6 years ago • 10 comments

Add a new room event for signaling permanent errors occurring at bridges. References MSC1849.

Rendered

V02460 avatar Jul 09 '19 18:07 V02460

Is the somewhat dangerous affected_users field really necessary? It seems like the main objective here is to allow clients to flag messages that were not delivered by a bridge, and that can be done without regular expressions.

Ralith avatar Jul 09 '19 22:07 Ralith

It depends on what is deemed necessary. You are right that it is the main objective to flag a message. That gives the user the information that something went wrong and adding the network name, the reason or the affected users adds more information to that.

A key question is what helps a user and what does not. Adding this general event type helps definitively imo as it shows a previously invisible problem. The question now is how a user can act upon that and this should determine what is included and what not. Here is what came to my mind of what might be a user's reaction:

  • Trying to resend the message No additional info required.
  • Reaching users via a different channel affected_users required.
  • ~~Blaming~~ Bug report to the bridge/the network network, reason, required

(Maybe an informed user is a goal in itself as well.)

By leaving out the affected_users attribute the second option wouldn't be possible anymore¹. These benefits must be weighted against the problems it may cause security- or otherwise. I don't find myself in the position to judge how severe of a problem a regex/regex-like addition to the protocol really is. Maybe its not so bad after all, maybe it's a really Bad Idea™. I would like to hear from more people about those points, so we can come to a conclusion.

¹ On a semantic level anyway. A user might be able to guess from the network name and the peoples nick name endings e.g. (Discord) that these were the ones affected.

V02460 avatar Jul 10 '19 08:07 V02460

I agree that it's useful to be able to tell which users in a room are there via a bridge. It might make more sense for that to be handled separately from error reporting, e.g. as done by the widely deployed IRC flair.

Ralith avatar Jul 10 '19 08:07 Ralith

I agree that it's useful to be able to tell which users in a room are there via a bridge. It might make more sense for that to be handled separately from error reporting, e.g. as done by the widely deployed IRC flair.

That might be enough. Just wanted to note that this is slightly ambiguous as there might be more than one IRC bridge in a room.

V02460 avatar Jul 10 '19 09:07 V02460

Just wanted to note that this is slightly ambiguous as there might be more than one IRC bridge in a room.

A more formal/structured scheme for mapping users to bridges might be nice, yeah, but I really think it would be best done independently of error reporting, since it is otherwise useful and avoids a complicated and potentially hazardous feature here.

Another problem that comes to mind is spoofing. How can we guarantee that reported errors are genuine? The most obvious answer is "configure power levels such that only the bridge bots or an admin can send the event," but that's not a strong guarantee, and risks misconfiguration. One solution might be to always identify bridges by their matrix-side bridge bot. Then errors sent by the bridge bot are inherently associated with the bridge in question, and things like human-readable network identifiers and the set of affected users can be determined robustly based on that information.

Ralith avatar Jul 10 '19 17:07 Ralith

One solution might be to always identify bridges by their matrix-side bridge bot. Then errors sent by the bridge bot are inherently associated with the bridge in question, and things like human-readable network identifiers and the set of affected users can be determined robustly based on that information.

This was my assumption about how it was supposed to work, and we should codify this in the proposal unless @V02460 has other ideas.

I had a proposal a long time ago which mapped bot mxids <=> bridge metadata in the room state.

Half-Shot avatar Jul 10 '19 17:07 Half-Shot

In particular, in the presence of such a mechanism I think both the "network" and "affected_users" keys would be best replaced by indirect lookup through that same mechanism, because it provides a single necessarily consistent source of truth.

Ralith avatar Jul 10 '19 17:07 Ralith

Was there any interest in a more robust and rigorous approach to determining affected users, as previously discussed?

Ralith avatar Aug 06 '19 04:08 Ralith

Was there any interest in a more robust and rigorous approach to determining affected users, as previously discussed?

I added notes about MSC 1410: Rich Bridging now, which provides exactly that. If it were already in The Spec, it would definitively be what should be used and this is the nicer way to approach it imo. The obvious hurdle here is that the MSC is not accepted yet and will probably need some more work.

Currently I am ranking the two benefits we would get from it as moderately important, but want to hear if others agree with my assessment.

A way forward could be to implement the current behavior and swap out the parts which rely on MSC 1410 when it is ready. I think adding the new changes would be backwards compatible, so they could be just tucked on later.

V02460 avatar Aug 06 '19 09:08 V02460