jamulus icon indicating copy to clipboard operation
jamulus copied to clipboard

Support TCP for protocol messages

Open softins opened this issue 4 months ago • 20 comments

What is the current behaviour and why should it be changed?

All Jamulus protocol (non-audio) messages are currently delivered over the same UDP channel as the audio. For most protocol messages, this is fine, but those that send a list of servers from a directory, or a list of clients from a server, can generate a UDP datagram that is too large to fit into a single physical packet. Physical packets are constrained by the MTU of the Ethernet interface (normally 1500 bytes or less), and further by any limitations in links between hops on the internet. Neither the client nor the server has any control over these limitation. It's also possible a large welcome message could require fragmentation.

The UDP protocol itself allows datagrams up to be up to nearly 65535 bytes in size, minus any protocol overhead. IPv4 will allow nearly all of this size to be used, in theory. If the IPv4 datagram being sent by a node (host or router) is too large to fit into a single packet on the outgoing interface, the IP protocol will fragment the packet into pieces that do fit, with IP headers that contain the information needed to order and reassemble the fragments into a single datagram at the receiving end. Normally intermediate hops do not perform any reassembly, but will further fragment an IP packet if it will not fit the MTU of the outgoing interface.

The receiving end needs to store all the received fragments as they arrive and can only reassemble them into the original datagram once all fragments have been received. The loss of even one fragment renders the whole datagram lost, and the remaining received fragments consume resources until they time out and are discarded. There are also possibilities for a denial of service attack if an attacker deliberately sends lots of fragments with one or more missing.

If a directory has more than around 35 servers registered (depending on the length of the name, city, etc.), the list of servers sent to a client when requested is certain to be fragmented. Similarly, if a powerful server has a lot of clients connected, e.g. a big band or large choir, the list of clients sent to each connected client can get fragmented. In either of these cases, a client that is unable to receive fragmented IP packets will show an empty list or an empty mixer panel.

There are several reasons that fragmented IP datagrams can fail to make it from server to client:

  • The configuration of a user's router, either accidentally or deliberately. Sometimes a user can be helped by a knowledgeable friend to check and fix this, but often not.
  • The configuration of an intermediate router along the path from server to client. This is fairly rare, but could be a carrier's deliberate choice to avoid the kind of DoS attack mentioned above. For whatever reason, it is outside the control of the user or server operator.
  • The IPv6 protocol deliberately has no provision for fragmentation of datagrams at the IP layer. So this is a complete show-stopper for the use of IPv6 in directories, as there is therefore no support at all for large UDP messages.

The IPv6 limitation means that resolving this issue is a prerequisite to implementing IPv6 support in directories as per the ongoing discussion in https://github.com/orgs/jamulussoftware/discussions/1950.

Describe possible approaches

There is a longstanding discussion at https://github.com/orgs/jamulussoftware/discussions/1058 about the problems this issue is intended to solve, and mentioning various approaches that have been tried or proposed.

  • Limiting the size of directories. This doesn't go far enough, and as mentioned above, a directory needs to be really small (less than 30 or so servers) to be sure of avoiding fragmentation.
  • Implementing "split" messages at the Jamulus protocol level using REQ_SPLIT_MESS_SUPPORT, SPLIT_MESS_SUPPORTED and SPECIAL_SPLIT_MESSAGE. I'm not sure whether such split messages are ever used in practice, and it appears that they only apply to connected messages, not the connectionless messages which are most at risk from fragmentation. In addition, the size of split parts is fixed, and not intelligently determined from any kind of path MTU discovery.
  • Having the directory also send a "reduced" server list with only the bare information of name, IP and port (CLM_RED_SERVER_LIST). This also fails to avoid the problem, as a directory list that may take around 7 fragments in its full form still takes around 3 fragments in its reduced form.
  • I experimented with zlib compression of servers lists (https://github.com/orgs/jamulussoftware/discussions/1058#discussioncomment-8354688), but it only provides around 40% compression, not enough to avoid fragmentation.

The only possible solution is to send some protocol messages using TCP instead of UDP, when talking to a compatible client. UDP would still be available for backward compatibility when talking to older clients or older servers.

There are two kinds of protocol message that each need to be handled differently:

  • Connectionless messages CLM_*. These are unrelated to a channel (with one exception). They are mainly used by a client to fetch information for the Connect dialog:
    • List of servers from a directory (CLM_REQ_SERVER_LIST).
    • List of connected clients from a server (CLM_REQ_CONN_CLIENTS_LIST).
    • Small messages such as requests, ping, version and OS, register and unregister server. These are small enough never to need fragmentation.
  • Channel-specific messages. These need to be related to a connected channel on the server. Currently, they are identified by the IP:port of the client end.

Connectionless Messages

For connectionless messages, the client can send a TCP connection request to the server, with a timeout. If the server supports TCP, this connection will be accepted and the client can then send the CLM_REQ_* message over the TCP connection. The server needs to interpret the message and send the response back over the same TCP connection. The client can then close the connection or leave it open for sending another message (tbd). If the TCP connection from the client is refused or times out (probably due to a firewall dropping the connect request), the client can fall back to the existing UDP usage to send the request. For this reason, the TCP connection timeout will need to be short, something like 2 seconds. This will be plenty of time for a compatible server to answer.

I have a branch that implements the server side of connectionless messages over TCP, currently just for CLM_REQ_SERVER_LIST and CLM_REQ_CONN_CLIENTS_LIST, but others could be added as needed. It can be seen at https://github.com/softins/jamulus/tree/tcp-protocol. It is necessary to pass the TCP socket pointer via the function calls, signals and slots, to the point at which the response message can be sent. If this socket pointer is nullptr, the response will be send over UDP as presently, otherwise it will be sent to the referenced TCP socket.

Note that due to the variable size of Jamulus protocol messages, and the stream-oriented nature of TCP sockets, it is necessary for the receiver at each end first to read the fixed-size header (9 bytes), determine from that header the payload size, and then read the payload, plus two more bytes for the CRC.

I have tested it using a Python client, based on @passing's jamulus-python project, but enhanced to support TCP. See https://github.com/softins/jamulus-python/tree/tcp-protocol.

The next step is to add to Jamulus the client side of using TCP for connectionless messages to fetch server and client lists.

Connected Channel Messages

For connected channel messages, the situation is a little more complicated. The following factors must be considered:

  • The list of connected clients is sent to a participating client using a connected channel message (CONN_CLIENTS_LIST).
  • Each time someone else connects to or disconnects from the server, the server sends an unsolicited CONN_CLIENTS_LIST to each client that is still connected. For a large busy server with many clients, this could be a long message and subject, presently, to UDP fragmentation. It should therefore be sent if possible over TCP.
  • A server cannot initiate a TCP connection to a client. Therefore the client needs to open a TCP connection to the server at the beginning of the session, and keep the connection open continuously until leaving the session. This connection should be used by the server to send the updated client lists to the client.
  • If a client has both a TCP and a UDP connection to the server, there is no way for the server to relate the two connections just by IP and port number, as the source ports will not be related to each other. Even if the client were to bind both TCP and UDP sockets to the same local port number, they could get mapped independently to different ports by a NAT router in the path.

My proposal to solve the last point above is as follows:

  • The client starts the session in the same was as at present, by sending an audio stream.
  • On receiving the new audio stream, the server searches by IP:port (CHostAddress) for a matching channel (in CChannel vecChannels[]), and on not finding a match, allocates a free channel in that array for the new client. It stores the CHostAddress value in the allocated channel, and returns the ID (index) of the new channel.
  • The server immediately sends this ID to the client as a CLIENT_ID connected channel message. This is all existing behaviour so far.
  • A TCP-enabled client, when it receives this CLIENT_ID message, will initiate a TCP connection to the server. If the connection attempt fails, the client will assume the server is not TCP-enabled and will not retry. Operation will continue over UDP only as at present.
  • If the TCP connection succeeds, the client will immediately send a CLIENT_ID message to the server, specifying the client ID that it had just received from the server. This will enable the server to associate that particular TCP connection with the correct CChannel, and the server will store the pTcpSocket pointer in the CChannel.
  • The server can then easily send the CONN_CLIENTS_LIST updates to the client over TCP if the socket pointer in the channel is not null, or otherwise over UDP. It could also send the welcome message over the same TCP socket, improving support for longer welcome messages.
  • Other connected channel messages that are not size-critical could be sent over either UDP as at present, or the TCP connection. This is open for discussion.
  • When the client wants to disconnect from the channel, it will send a CLM_DISCONNECTION the same as at present (over either UDP or TCP), but will also close any open TCP connection to the server.
  • Over UDP, connected channel messages need to be acked, and will be retried if the ack is not received. This is necessary due to the lack of guaranteed delivery in UDP. Over the TCP socket, which provides guaranteed delivery, it would be possible to send messages without queuing or needing acks, and this might simplify implementation. Comments?

I have not yet implemented any of this connected channel functionality, beyond adding the pTcpSocket pointer to the CChannel class.

This is currently a work in progress, as described above. The purpose of this issue is to allow input from other contributors on the technical details mentioned above, and to keep the topic visible until the code is ready for a PR.

The expectation is that all the public directories will support TCP connections. This will also need suitable firewall rules at the servers. However, clients implementing all the above will still be backward-compatible with older directories and servers run by third parties. Similarly, older clients connecting to newer directories and servers will continue to operate as a present over UDP, with no use of TCP required.

Has this feature been discussed and generally agreed?

See the referenced discussion at https://github.com/orgs/jamulussoftware/discussions/1058 for history. I would value comments within this Issue regarding the solution I am proposing above. @pljones @ann0see @hoffie @dtinth and any others interested.

softins avatar Feb 27 '24 18:02 softins