mastodon icon indicating copy to clipboard operation
mastodon copied to clipboard

Support Bittorrent for data distribution

Open alsutton opened this issue 3 years ago • 5 comments

Pitch

To reduce the load on originating servers, particularly in relation to large media files, bittorrent support could be used to distribute the distribution load across the fediverse.

The originating instance could host a BT Tracker, and then other instances, or clients, could use that to obtain the complete toot, media file, or whatever from different servers rather than everyone trying to get it from one place and slamming a single instance.

Motivation

This would allow instance operators to serve more users effectively because the bandwidth required to serve a large file across the fediverse would be distributed across many instances and clients.

alsutton avatar Nov 25 '22 19:11 alsutton

Somewhat related: https://github.com/mastodon/mastodon/issues/21461

afontenot avatar Nov 25 '22 23:11 afontenot

Media is served from the user's "local instance" anyway (for privacy reasons) - so the only way an instance would get slammed (and most instances serve media from a CDN anyway) would be if a lot of their users tried to view it at the same time and/or it suddenly got highly federated and all the remote instances (of which they are "only" a few thousand) tried to fetch it at the same time.

Whilst Bittorrent sounds a good idea, it'll be blocked by many corporate firewalls, quite a few datacentres have bans on running BT servers of any type (legit or not: they just don't want the risk factor), they'll be long delays in propagation, there are privacy issues where you can get the remote user's IP address etc.

rbairwell avatar Nov 26 '22 00:11 rbairwell

would be if a lot of their users tried to view it at the same time and/or it suddenly got highly federated and all the remote instances (of which they are "only" a few thousand) tried to fetch it at the same time.

This happens all the time though, right? An extremely popular account might be followed by users on a significant fraction of instances. Everything they post means that a message has to be sent to every single instance that has even one user following them. If that message contains a link back to an image, most instances will immediately prefetch and cache the image. Increases to the overall size of the network mean an increasing outgoing bandwidth burden on these instances. Prefetching is more or less required for Mastodon because all (or almost all?) instances have a federated feed that allows immediately viewing all incoming publicly visible content.

I think the proposal is that all Mastodon instances should share this bandwidth burden through some mechanism. You're right that BT might not be the best approach.

afontenot avatar Nov 26 '22 02:11 afontenot

Thanks for the feedback, I'll try to address the issues which have been raised;

  • This is to help the system scale, and not neccessarily a widespread problem now. The bird site has given us an idea of how many followers some folk can have (millions), and we already have accounts closing in on the 100,000 follower mark (e.g. https://mastodonapp.uk/@stephenfry and https://mastodon.nu/@gretathunberg), so helping the fediverse support scaling up to support multiple, multi-million follower accounts in the near future is the aim here.

  • This would be inter-server and for clients, so rather than considering just the current instance situation (10^4), I'm considering the situation with all distribution end points (10^6 possibly 10^7)

  • I assumed, possibly wrongly, that instance servers already have a capabilities API where optional features, which may not be supportable by some instances, could be reported to clients. I don't see this as the only way to distribute data, but as an optimisation to make distribution less bottlenecked (i.e. less dependent on single nodes such as a posts home node). The capability to support BT could even be done on the initial request to the BT tracker; If there's no response then fall back to HTTP, if there is a response the tracker will already know which instances can support BT distribution because those that don't support BT won't register with the tracker.

  • I proposed Bittorrent to avoid the XKCD927 problem. BT has already had multi-year widespread testing, and I'm aware some companies use it to distribute data to more servers than we have instances, so it looks a good fit for the problem. I'm unaware of any active large-scale IPFS deployments, but it'd be useful to compare the two.

Hopefully this adds some clarity which can help folk keep the discussion going.

alsutton avatar Nov 26 '22 08:11 alsutton

Having read the IPFS suggestion, there's an additional benefit to BT; Deletion.

If an instance operator wants to remove some data (e.g. some offensive media), the originating instance operator can remove the relevant entry from the BT tracker, which makes it effectively unavailable.

alsutton avatar Nov 26 '22 08:11 alsutton