matrix-spec-proposals
matrix-spec-proposals copied to clipboard
Support for websockets
Author: @richvdh, @krombel Documentation: https://github.com/matrix-org/matrix-doc/blob/master/drafts/websockets.rst, https://docs.google.com/document/d/104ClehFBgqLQbf4s-AKX2ijr8sOAxcizfcRs_atsB0g/edit Date: 2015-11-16
We should let clients do sync requests (at least) over websockets.
There is a draft of how this might look at https://github.com/matrix-org/matrix-doc/blob/master/attic/drafts/websockets.rst.
[I thought we must have an issue for this, but couldn't find one].
There's a PR on synapse for this, although I couldn't find the spec issue either.
https://github.com/matrix-org/synapse/pull/2388
Should this be turned into a normal MSC PR?
Also, isn't it superseded by matrix-org/matrix-spec-proposals#2108?
Is this still not possible in Matrix today?
Nope. It's just never been much of a priority; it doesn't seem like it would make much difference in practice, but would come with a bunch of maintenance overhead.
If someone wanted to do a prototype and could demonstrate it made a real difference, that would be interesting, but that's unlikely to be me.
The difference would be that instead of repeated polling of HTTPs, you'd have a stable connection through sockets. So the difference IN PRACTICE is that websocket messaging is faster. Considering Matrix is for instant messengers, I'm shocked to see polling instead of websockets. EDIT: While this probably doesn't help Matrix Protocol as is, we plan on creating a sidecar service that'll forward the events through websockets, which can decrease the latency significantly. But that's a homeserver-specific solution, unfortunately.
It's not polling, it's long polling. The latency should be the same as websockets.
I'm not an expert in this but long polling is very clearly slower. WebSockets maintain the same handshake, whereas long polling requires you to make a new handshake every time. In this case, every time a new event arrives (such as a single person in a group chat starting to type), you have to make a handshake. If that doesn't make it slower and less scalable for clients of the size of Telegram or Whatsapp, idk what does.
In this case, every time a new event arrives (such as a single person in a group chat starting to type), you have to make a handshake.
This is not quite true as HTTP1.1 uses keepalive by default and HTTP2 and 3 have a persistent connection that stays on. You would only have to send new HTTP headers and body, no new handshake (neither TCP nor TLS) required. But i agree that server sent events (and maybe websockets, i'm not quite sure on that) would be more beneficial.
Long polling is slower. If two messages arrive 10ms after each other, a HTTP 1.1 server can send the first one immediately, but the second one must wait until the client sends a new set of headers, which takes 1RTT. Processing said headers also takes a bunch of CPU cycles, on both sides.
(Haven't checked HTTP 2 and 3; I think they support some kind of 'hey client I think you'll want this soon', but I don't know if that's implemented server side. I've never heard of it used for API responses, I've only heard of it used for static resources.)
That said, long polling is still fully functional, and has a few advantages. It's easier to configure server-side (you need to configure HTTP access anyways), and it's easier to get past cranky corporate HTTP 1.0 proxies (though those are hopefully rare these days).
Long polling is slower. If two messages arrive 10ms after each other
I agree, that's why i think server sent events would be a better fit than websockets. they are basically long polling but without a new request every time a message arrives.
That said, long polling is still fully functional, and has a few advantages.
The neat things is that those would still be there as server sent events are just regular HTTP responses that work with all HTTP versions, in contrast to web sockets that are always TCP and therefore slower at connecting than for example HTTP3.
SSE is one-way. It's simpler, but if you want to regularly reconfigure which events you receive (for example when switching which channel you're looking at, so you only get typing events you'll see), you'd have to tear down the entire SSE channel and set up a new one, or do that configuration via a completely different channel.
If that functionality is needed, websocket is the most natural option.
But Matrix does (to my knowledge) currently not have that functionality at all, and I'm not sure how high priority it is. And SSE is indeed easier to configure.
I have no strong opinion on which is better. Websocket is better for high-traffic accounts, but SSE is simpler to configure, and more resilient to flaky mobile internet (due to http3/quic roaming) and disobedient proxies. (I also think websocket has more libraries available, making it more accessible to third party client devs, but I didn't check.)
Maybe both. (Or maybe just pick one, there are more important things than three different polling mechanisms. Not my call.)
I will note three concrete advantages of SSE over WebSockets for sync beyond even connectivity:
- Authorization can use the same flow in all modern browsers if
fetchis used, simplifying backend implementations greatly. This is not the same for web sockets, as they lack any way to expose raw upgraded connections to users. - Compression in SSE can be applied transparently across the whole stream, deduplicating even across messages, and it even compreses almost all of the framing aspects. None of this is the case for web sockets with per-message deflate, making it too costly for small messages. (Note that you'd have to flush the stream server-side after every message, including flushing zlib state with
Z_SYNC_FLUSH.) This will greatly reduce network overhead, useful for high-throughput servers. - Restarting disconnected streams is supported at the protocol level in server-sent events. This is not the case for web sockets.
Concretely, I could see /sync easily extended to stream events if it receives Accept: text/event-stream, and optionally compressing it if it receives Accept-Encoding: deflate or Accept-Encoding: gzip. ?since would be ignored if the Last-Event-Id header is present, and ?timeout would be ignored entirely.
And events could use the keys of what's in the current 200 response as event names and what would be returned in arrays as values, omitting what didn't change. Something like this:
- In place of
m.account_infoandm.presencetypes, the raw event is sent instead with name =event.type, data =event.content. - Type
m.device_lists: data is DeviceLists object - Type
m.device_one_time_keys_count: data is{string: integer}object - Type
m.rooms: data is Rooms object - Type
m.to_device: data is ToDevice object
This is just a rough sketch, to be clear.