CrewLink icon indicating copy to clipboard operation
CrewLink copied to clipboard

Add support for TURN server and allow configurable STUN server

Open sjoerd108 opened this issue 3 years ago • 33 comments

What is this?

CrewLink uses peer to peer connections for most of its traffic between clients. This is great because it keeps the voice server's load to a minimum. That said, creating peer to peer connections isn't that easy on the modern internet due to NAT, firewalls and strict network admins. Unless you are on a local LAN, chances are you need a way to figure out your public IP address and port to establish a peer to peer connection. That's where STUN comes in and currently CrewLink is hard-coded to use one of Google's STUN servers. STUN works in most cases, but some people find themselves behind a strict NAT/firewall that doesn't allow peer to peer connections some way or another. This is where TURN comes in. TURN acts as a relay between peer to peer clients, bypassing the NAT entirely.

CrewLink currently does not support configuring a TURN or custom STUN server. This pull request aims to address that. The existing libraries CrewLink uses already support using these servers, so all it introduces is the required UI to make it user-configurable. It does this by adding an "Advanced Settings" tab near the bottom of the settings menu that is collapsed by default. People hosting their own servers opting to add a TURN/STUN server can then instruct users to configure these settings.

Why?

I'm a big fan of CrewLink and enjoy playing Among Us with it with many people. Some people are, sadly, unable to use it due to their ISP/network admins making it hard to establish peer to peer connections. Testing on a custom build that supports a TURN server has dropped the amount of people not being able establish a connection with others to 0 in the places I've tested. I believe most people that can set up a crewlink server are also able to set something up like Coturn greatly increasing the people they can play with.

sjoerd108 avatar Dec 05 '20 22:12 sjoerd108

As-is, this adds too much stress to the user. Please reconfigure this so the voice server can optionally send a TURN server over websocket, to make the entire process automatic.

ottomated avatar Dec 05 '20 23:12 ottomated

I see. I had some small concerns that it may be too easy to gain access to the TURN server if it was implemented like that, but I suppose one can rotate the TURN keys fairly regularly to solve that.

I'll update this PR and get another PR ready for CrewLink-server.

sjoerd108 avatar Dec 06 '20 00:12 sjoerd108

I've implemented the requested changes and opened a PR for CrewLink-server as well. Both are ready for review.

sjoerd108 avatar Dec 07 '20 01:12 sjoerd108

LGTM

MattJustMatt avatar Dec 07 '20 02:12 MattJustMatt

hey, is this problem fixed now.. or?

marss5 avatar Dec 07 '20 15:12 marss5

I think I am having an issue because of this

ChaoticWeevil avatar Dec 08 '20 08:12 ChaoticWeevil

I think I am having an issue because of this

If you are testing this as a dev build, could you please include some logs and specify how you've configured the peer-config on the server? This isn't in the official release yet.

sjoerd108 avatar Dec 08 '20 18:12 sjoerd108

I think I am having an issue because of this

If you are testing this as a dev build, could you please include some logs and specify how you've configured the peer-config on the server? This isn't in the official release yet.

I know I had some issues. The first game in a lobby ran great, with no issues whatsoever. Once the game ended and you were on the victory/defeat screen issues started popping up. About 50% of the time at least 1 person wasn't able to hear someone else, and the person they couldn't hear wasn't able to hear them back. This looked to be from the developer console to be just a connection refused from RTCPeerConnection. Seems there is also DOMExecution, and signaling errors after a game ends. About 20% of the time about half of the lobby would lose the ability to hear or speak. I grabbed the logs from one of the test sessions and it can be found here.

The peer-config is set to force the relay, use Google as the STUN server, and then a custom TURN server setup using coturn on a DigitalOcean droplet. I can confirm the TURN server works with both Trickle ICE and through the game since the first lobby works and my droplet's bandwidth spikes. If you'd like the info for the TURN server I can also provide that.

mja00 avatar Dec 08 '20 18:12 mja00

@mja00 Thanks for this, I'll give those logs a good look. I haven't had those issues during testing of my own, but I'll see what I can do to reproduce them. Without giving sensitive info about your TURN server away, could you specify what, if anything, you changed about Coturn's config (aside from credentials), I'd like to replicate the setup if possible.

sjoerd108 avatar Dec 08 '20 18:12 sjoerd108

@sjoerd108 Sure, my Coturn config is as follows

server-name=turn.mja00.com
realm=turn.mja00.com
cert=/etc/letsencrypt/live/turn.mja00.com/cert.pem
pkey=/etc/letsencrypt/live/turn.mja00.com/privkey.pem
fingerprint
listening-ip=0.0.0.0
listening-port=3478
min-port=10000
max-port=20000
log-file=/var/log/turnserver.log
verbose
user=fakeuser:fakecreds
lt-cred-mech

With turn:turn.mja00.com:3478 set as the turn server in peer-config. Looks like turn:turn.mja00.com:5349 also works as an IP.

mja00 avatar Dec 08 '20 18:12 mja00

@mja00 Thanks. I'll replicate those settings and try to do another test run with a full/almost full lobby. I wasn't running it with TLS as WebRTC already encrypts the traffic, but I'll try adding some certs though I doubt it'll make the difference.

sjoerd108 avatar Dec 08 '20 18:12 sjoerd108

@sjoerd108 No problem, TLS probably doesn't make any difference and it honestly might be weirdness with a basic DO droplet. I have some different hardware I can spin a TURN server up on if needed.

mja00 avatar Dec 08 '20 18:12 mja00

I'd just like to add, this issue about connections degrading after the first game I believe isn't unique to this specific implementation (I think it has a deeper cause). I also have a separate branch with relay-only for my private games and was also running into this. I think error handling (for error events from simple-peer) and some improvements to connection flow are the solution.

MattJustMatt avatar Dec 08 '20 20:12 MattJustMatt

@mja00 I have to side with @MattJustMatt here. We did two test runs, one with this version, replicating your TURN server config and the latest official version (1.1.5). When running the TURN version we saw several instances of ERR_DATA_CHANNEL being thrown, the connection did not die though. With the latest official build we not only saw ERR_DATA_CHANNEL but also "Cannot signal after peer is destroyed". After the latter error happened one participant indeed stopped being heard. I'm fairly certain if either version was being used long enough both of these errors could occur.

I believe it's safe to say the errors you are seeing are unrelated to this feature.

sjoerd108 avatar Dec 08 '20 22:12 sjoerd108

@sjoerd108 Glad to hear it wasn't the feature then, looks like I'm gonna have to understand the codebase and delve into some NodeJS to fix some of those errors. 😛 I appreciate checking out the errors and confirming it wasn't the feature.

mja00 avatar Dec 08 '20 22:12 mja00

I see. I had some small concerns that it may be too easy to gain access to the TURN server if it was implemented like that, but I suppose one can rotate the TURN keys fairly regularly to solve that.

I'll update this PR and get another PR ready for CrewLink-server.

This could be solved using the Short-Term Credential Mechanism if the STUN server was included as part of CrewLink-server (e.g. via node-turn). The CrewLink-server could add users at runtime using their id & a generated password to allow them to authenticate against the STUN server, removing the user as they disconnect. Would also save hosts the hassle of having a seperate process running to provide proper sound quality to players behind strict NAT/firewall.

NewYearNewPhil avatar Dec 12 '20 23:12 NewYearNewPhil

That's a good point, it'd solve the problem of secrecy surrounding TURN credentials. I did have a look at using Node TURN my only concern is that it wouldn't perform that well compared to a native implementation. Though for small servers that shouldn't be a problem.

sjoerd108 avatar Dec 12 '20 23:12 sjoerd108

This would be a great feature, and it's pretty simple to do a TURN server with authentication backed by the socket.io ids, I actually did one in Go. It's actually pretty vital to ensure everyone is able to use CrewLink without issues.

That's a good point, it'd solve the problem of secrecy surrounding TURN credentials. I did have a look at using Node TURN my only concern is that it wouldn't perform that well compared to a native implementation. Though for small servers that shouldn't be a problem.

It seems like this would be a community-only feature, so a built in TURN server in Node would be plenty for the 10-20 people a regular server would have on it, though the bigger servers would probably need something better. I think coturn has a REST API to create credentials, so maybe a docker stack/docker compose file for deploying a TURN server as well as a simple integration into the node version? A pluggable backend/setting to either run a built-in TURN server or use the API would be cool.

tystuyfzand avatar Dec 13 '20 23:12 tystuyfzand

@tystuyfzand Docker would solve it quite nicely, yeah. I'm going to look into at least doing the built-in node-turn thing as it would indeed greatly increase the amount of people being able to play. This''ll be mostly server-side but I'll make both PRs drafts for now.

sjoerd108 avatar Dec 13 '20 23:12 sjoerd108

@sjoerd108 I'd say this side of the PR is complete, as it adds a very simple way to modify the ICE configuration. I think this should be ready to pull, minus any review on the styling/whatever - it'd let the server/community side evolve and make the app even better.

tystuyfzand avatar Dec 14 '20 00:12 tystuyfzand

@tystuyfzand I think I may introduce some changes to the peer-config from the server, which may affect validation on this side.

sjoerd108 avatar Dec 14 '20 00:12 sjoerd108

Hey, any news on the PR ? This features looks absolutely awesome !

Flyy-y avatar Dec 15 '20 19:12 Flyy-y

@Flyy-y It's coming along quite nicely. Should have an update ready for review not too long from now.

sjoerd108 avatar Dec 15 '20 22:12 sjoerd108

See https://github.com/ottomated/CrewLink-server/pull/28#issuecomment-753318981

sjoerd108 avatar Jan 01 '21 13:01 sjoerd108

Earlier we had a lot of peer-to-peer connection issues when playing, so I set up a coturn server, and been using a custom build of this branch for a couple of rounds, and this TURN patch seems to solve all of the issues we had related to connection.

Would love to see this patch end up in the official release

pengi avatar Jan 02 '21 10:01 pengi

Please Create new Release after merging this

mudassirzulfiqar avatar Jan 08 '21 13:01 mudassirzulfiqar

So I've been reading through this PR and I love this enhancement to CrewLink. The one thing I notice, is that there is no retry if the Peer to Peer connection fails. It would be useful for the server to provide a list of STUN/TURN servers and the client go through them until it finds one it can connect to with no issues.

Ideally, it would be preferred to use an open STUN server like the current Google provided STUN server that is set as the default; followed by a list of one or more additional STUN or TURN servers the client can attempt to connect to.

Having said that, I haven't specifically tested this PR yet, so if this is functionally how it works today, hooray and thanks for all of your hard work. If it is not, please consider adding a retry with exponential backoff and timeout so that you might be able to cycle through a list of servers instead of using just one.

billyjbryant avatar Jan 11 '21 19:01 billyjbryant

@billyjbryant This is pretty much already here. If the server does not send a client peer config then the client will opt the use one of Google's stun servers by default. The server, too, will send Google's STUN server by default if no peer behaviour is specified. As for retry logic, that is something that has been implemented and merged in PR #382.

sjoerd108 avatar Jan 11 '21 20:01 sjoerd108

I would like to ask, if it's not a problem, if there is a way to use Crewlink without Peer to Peer. Because, for all of my friends, here in italy we have a Strict NAT type. So It would be really helpfull to have a "main" server that handle all the connections without using Peer to peer.

If there is a way to use it then let me know, there's no problem for me to setup a server and such.

amgno avatar Jan 12 '21 10:01 amgno

I would like to ask, if it's not a problem, if there is a way to use Crewlink without Peer to Peer. Because, for all of my friends, here in italy we have a Strict NAT type. [...]

That's the problem we have here too. Some NAT gateways can track connections and some don't. And since the initiator of the connection is the last person to reload (or if it was the other way around? At least it depends on loading order), the connection problems is really erratic. So having the server not behind NAT (I'm using a smaller paid VPS from a hosting service) really helps.

So this patch will, given a crewlink server with matching patch and proper opt-in configuration, make it possible to communicate with just one server, called TURN server. However, it will still be one audio stream between each player, just transmitted though the server. So I was looking at some traffic graphs for the TURN setup I'm using, and for a game of 8 people (I think) was about 3.2Mbps traffic in each direction on the server.

I overheard a short discussion on ottomateds twitch stream when he was discussing this issue, and yes, it will be require a significant increase in server requirements to use a TURN server, and would therefore not be suitable for publicly hosted servers. But it is possible to use a private server where a group of players can get together and share their cost if a TURN server is requiered.

pengi avatar Jan 12 '21 11:01 pengi