bugout icon indicating copy to clipboard operation
bugout copied to clipboard

hub.bugout.link performance

Open draeder opened this issue 4 years ago • 41 comments

Hi Chris:

I've been building a tracker server tester and as part of that I built my own tracker server. I wanted to share some observations that might help you understand the performance issues you've been experiencing with hub.bugout.link.

Short Summary Your tracker server got added to a list of tracker servers that is used widely by users in China who use an app called BitComet. The users in China are very likely killing your server. This issue was opened by someone to have your tracker server added to trackerslist. You can open an issue to have your server removed, too, per the trackerslist readme.

I know this to be the reason for your server's performance issues by reviewing https://hub.bugout.link/stats and seeing that it matches my own server's stats after personally requesting my server be added to trackerslist, in particular the type of client showing up in the stats: BitComet.

Details While testing my tracker server tester, I found that your server often responds with a server 400 bad request. I remember you mentioning that SlingCode introduced extra load to your server, and at the time, that was the explanation for the performance problems.

However, I am starting to see the same kinds of problems with my tracker server, and I don't have any major applications built that uses it yet.

My tracker server was built to run on Heroku since it has a free plan and also a low price plan. After getting things working, I opened an issue with trackerslist to have my tracker server added to the list -- I was curious to see how it would perform with many users. Once it was added, I noticed that my Heroku server was generating a lot of errors and the response time was averaging 30s(!). I've gotten 3.5 million requests to my server and the majority of them are from China users using BitComet.

Since I am using Cloudflare for the CNAME associated with my domain and the Heroku app link, I was able to identify the origin of the traffic with Cloudflare's web analytics. The majority of traffic was coming from China. So, I added a firewall rule to block all traffic from China. This solved the performance issue for about 24-48 hours or so, bringing the response time of my server back to single digit milliseconds. But today it's getting pegged again despite the firewall rule. I have a case open with Cloudflare support to explain why only ~ some ~ traffic from China is getting blocked and other traffic let through.

In the end, I would like to block certain types of clients, or only allow certain types of clients like WebTorrent. It seems that bittorrent-tracker could allow this, since the stats page lists the connected clients, but it's not clear from the documentation how to do that.

I hope you find this information useful.. 🔢

Thanks, Dan

draeder avatar Feb 25 '21 03:02 draeder

@draeder thank you so much for this analysis, that is indeed useful information. Will have a bit of a think about how to handle this.

chr15m avatar Feb 25 '21 03:02 chr15m

@chr15m I personally want users from a restricted country like China to be able to use my tracker server however they need to, and ideally I wouldn't block any clients.... But the demand from China is so high, it leads to either performance issues or costs. I would love to explore a solution that addresses both....

draeder avatar Feb 25 '21 04:02 draeder

@draeder i have designed but not tested a possible solution to this using a "proof-of-work auction" or hashcash auction. Basically you give your server a limited number of slots which it is able to support, and clients have to perform proof-of-work of a sufficient difficulty to connect. This ensures that the server is never overloaded, and also that clients are given a signal about which server they should connect to (lower PoW is better). I've started preliminary work on this, will let you know if/when I make progress. Let me know if you come up with anything yourself.

chr15m avatar Feb 25 '21 04:02 chr15m

@draeder another alternative I've been exploring is to tie cpu load / memory consumption to the PoW value. So as the load goes up and memory becomes scarce, the PoW clients must perform becomes more difficult, warding them off.

chr15m avatar Feb 25 '21 04:02 chr15m

@chr15m If you need me to test anything you've created, I can add new servers to Heroku as necessary with little to no cost as long as those servers don't live too long. Just let me know.

What I've been seeing with Heroku though, is CPU and Memory doesn't really increase as the requests increase.. The issue is the number of errors increases as peers get connected together, and that seems to create the increase in time to respond to requests. Specifically, as peers connect, sockets are no longer needed, so the server responds with 503 server unavailable due to the websockets timeout. It's unnecessary use of CPU time to handle/respond to those timeouts.

In Heroku, there's no way to address that... but there must be something that can be done in Node.js.

draeder avatar Feb 25 '21 04:02 draeder

@draeder huh, that's interesting. So it sounds more like the sheer number of websockets is the issue. There could also be memory leaks etc. in the server itself. I think I remember @DiegoRBaquero saying their tracker was restarted every 24 hrs, so maybe I should do the same with hub.bugout.link. :thinking:

chr15m avatar Feb 25 '21 05:02 chr15m

@chr15m Well that's interesting, because I was talking with the developer of fake-bitttorrent-client about how to address timeout issues for my tracker server tester ... In that issue I found that limiting the number of sockets for the client request helped speed up responses. Looking at it from the other direction may be useful.

draeder avatar Feb 25 '21 05:02 draeder

Also, there is a limit to dynamically assigned ports that can be opened for any computer using IPV4... https://stackoverflow.com/questions/113224/what-is-the-largest-tcp-ip-network-port-number-allowable-for-ipv4

draeder avatar Feb 25 '21 05:02 draeder

@draeder ah yes, and all of these issues point to the benefit of there being more trackers that each handle a smaller load individually.

chr15m avatar Feb 25 '21 05:02 chr15m

More and smaller tracker servers that are part of a mesh also means the whole system is more robust to single individual trackers going offline.

chr15m avatar Feb 25 '21 05:02 chr15m

Right... so how to get them participating is the question? I made P2P Tracker so anyone could run a tracker server either locally or in Heroku.. The trouble is, who will run it? My tracker server tester is finding very few responsive trackers. Yours and mine are in the list of working trackers..... unless China is killing our servers.

Your tracker server mesh idea is important, but people need to run their own trackers, first -- then run those tracker servers within the mesh....

draeder avatar Feb 25 '21 05:02 draeder

how to get them participating is the question?

I think the only think you can do is put a thing out there and tell people about it. If it is valuable and you have explained the value, people will run it. If not, try again.

chr15m avatar Feb 25 '21 08:02 chr15m

Why not just remove the tracker from the index or lower its position in the index?

hello-smile6 avatar Jun 06 '21 18:06 hello-smile6

What is this new suicidal application? I saw it yesterday, but didn't think about it.

迅雷在线 (Xunlei) 0.1.0.0 : 51
迅雷在线 (Xunlei) 0.0.1.2 : 12
BitComet 1.73 : 10
BitComet 1.76 : 11
BitComet 1.75 : 22
BitComet 1.74 : 13
BitComet 1.77 : 18
BitComet 0.58 : 1
Transmission 3.00 : 2
Vuze 5.7.6.0 : 3
BitSpirit 3.6.0 : 3
WebTorrent 0.0 : 1

514 active

hello-smile6 avatar Jun 06 '21 18:06 hello-smile6

Huh. Seems like a badly done P2P CDN. Chinese of origin, commercial.

https://en.wikipedia.org/wiki/Xunlei

hello-smile6 avatar Jun 06 '21 18:06 hello-smile6

I'll try making a magnet link only using Bugout and see what I get.

hello-smile6 avatar Jun 06 '21 18:06 hello-smile6

Server works to a point. Managed to get https://instant.io/#magnet:?xt=urn:btih:36c36245e2e7f813efef4d2908ab65920a8dd212&dn=beaker-browser.exe&tr=wss%3A%2F%2Fhub.bugout.link to work, saw myself on the stats page. Server 500 soon after. Seems like server is unstable, @chr15m @draeder . Maybe drop a few connections when one client refuses to seed and has >30 users.

hello-smile6 avatar Jun 06 '21 18:06 hello-smile6

147 peers. You could try a cron job that uses curl to get the stats and logs them, to keep track. @chr15m

hello-smile6 avatar Jun 06 '21 18:06 hello-smile6

Fair warning @chr15m Going to try to flood with connections from WebTorrent, try to get it to rebalance.

hello-smile6 avatar Jun 06 '21 18:06 hello-smile6

You should see a lot of 67...* WebTorrent users now, @chr15m

hello-smile6 avatar Jun 06 '21 18:06 hello-smile6

Hit 167 during stress testing, nearly wiped out my device. Seems like as soon as I backed off, more peers joined and filled the gap. @chr15m Now at 152

hello-smile6 avatar Jun 06 '21 18:06 hello-smile6

@hello-smile6 thank you for your testing and data. Under the current scheme where trackers can be flooded at no cost to the user, any mitigation is only going to be a temporary hack. I'm working on a more permanent solution to this problem in my spare time but I do not have anything to show yet. In the meantime, people should run their own trackers if they want better performance.

chr15m avatar Jun 07 '21 00:06 chr15m

Okay. Can you deploy dozens to various free services such as Glitch and Heroku and use DNS for load balancing?

hello-smile6 avatar Jun 13 '21 00:06 hello-smile6

@hello-smile6 That's up to you. I have not had any issues deploying my own tracker server to Heroku. The issues with my server popped up when my server address was available to those who wanted to use it for other reasons than my app.

draeder avatar Jun 13 '21 02:06 draeder

@hello-smile6 By the way, it just occurred to me that I was working on something similar to what you suggested with DNS. I have the repo set to private since I was doing a bunch of testing with it. Basically it uses Hyperswarm to create a backend server swarm for all trackers for the given app. Then, if you have a domain and host its DNS in Cloudflare, it updates a TXT record with the ws trackers. That gets passed along to all of the server peers. In that way, if any user joins one of the servers, their browser gets the list of servers from the TXT record and the browser seeds all of the servers with its peer address. It's still a work in progress, as I mentioned. But the idea is close to what you were suggesting.

draeder avatar Jun 13 '21 14:06 draeder

I'll deploy 2 or 3.

hello-smile6 avatar Jun 13 '21 22:06 hello-smile6

@hello-smile6 Well, I got back to writing this today. I have it nearly complete for a first pass. I'll come back soon and post the repo link. It's called signal-swarm. I could definitely use some testers when its ready.

draeder avatar Jun 13 '21 23:06 draeder

@hello-smile6 Well, I got back to writing this today. I have it nearly complete for a first pass. I'll come back soon and post the repo link. It's called signaling-swarm. I could definitely use some testers when its ready.

I will if there's an interface like instant.io I could use for it, @draeder . I've always wanted a better-quality version.

hello-smile6 avatar Jun 14 '21 00:06 hello-smile6

@hello-smile6 It's a tracker server implementation that communicates with other tracker servers based on a shared "topic" or "app name". To participate, you have to have a domain set up in Cloudflare.

draeder avatar Jun 14 '21 02:06 draeder

Huh. Could you deploy to Glitch?

hello-smile6 avatar Jun 14 '21 02:06 hello-smile6