lemmy icon indicating copy to clipboard operation
lemmy copied to clipboard

Federating/Proxying links and media

Open 0xDEADCADE opened this issue 1 year ago • 9 comments

Is your proposal related to a problem?

Somewhat, since I wouldn't classify it as a bug, but it is an issue I ran into. When a user shares a link to a post, comment, community, or shares media (such as an image hosted on their instance), that link is directly copied over. While using Lemmy, the client contacts a lot of other instances for media. Browsing linked posts or communities is often done by manually searching, as the link to the item is likely linking to another instance.

Describe the solution you'd like

Converting links to posts, comments, communities, or media from other instances to one directly handled by the instance that's displaying it. For posts, comments, and communities it should retrieve the item from the other instance similar to if it was searched for, then display it on the local instance. For media, it should either download media similar to what happens when trying to view Mastodon posts from Lemmy, or proxy it to the other server directly (maybe with caching).

This ensures the client does not contact other instances "at random", which prevents correlation of which accounts belong to which clients and their IP addresses. It would be more similar to Matrix approach, where a client can access any local or federated content only by connecting directly to their chosen homeserver.

Converting links locally would retain compatibility with other Fediverse projects, unlike the alternative below.

Describe alternatives you've considered

A system similar to matrix.to or mxc URIs could be used, but it would need to be adopted across multiple instances for it to function, and other Fediverse projects would no longer be able to interact easily with Lemmy posts. It would also have the downside of requiring more features to be implemented on each client for proper use, breaking compatibility with existing clients.

For links to posts, comments, or communities, there isn't much of an advantage to using something similar to matrix.to, as the link to an item could simply be detected and converted locally. For media, there would be an opportunity to decentralize more, and save bandwidth, by caching media on each server, and by optionally specifying a protocol. A "lemmy content" URI could look similar to this: lmc://direct/lemmy.ml/MEDIA-ID, where the only difference to Matrix MXCs is the direct "protocol" specifier. This could be used to link directly to other media sites, such as imgur: lmc://imgur/lemmy.ml/IMGUR-ID. Keeping the originating instance included means that instances not capable of handling the imgur "protocol" could simply proxy the media request similar to direct "protocol" media.

This would require a lot of extra work and effort, wouldn't be compatible with instances not running the newer version, or other Fediverse projects. It would probably be preferred to detect and convert links on each instance separately.

Additional context

As a frequent Matrix user, I found it quite surprising that Lemmy is so quick to load media from other servers. Both the web UI and Jerboa don't even show a warning, prompt or anything before loading images, profile pictures, and other media directly from an instance that you're not logged in on. This is a concern for privacy, as any instance that federates with the instance you're using has essentially as much info about your client as the instance you're logged in on. It also heavily impacts the general browsing experience being linked to a post, and not being able to open it in your preferred instance (where you're likely already logged in). It could also have impact on users in countries where a lot of censorship or blocking of URLs or domains occurs, as they would not be able to view most linked posts, unless those links are locally converted to an instance that isn't blocked in their country.

0xDEADCADE avatar Jun 07 '23 08:06 0xDEADCADE

This could be implemented with the GET /image/download?url={url} endpoint in pict-rs. However it would probably require some changes so that images are not reencoded in that case, and also deleted after some time (or if the cache gets too big).

In any case, contributions welcome!

Nutomic avatar Jun 07 '23 09:06 Nutomic

That would be the media part of it, yes. The user experience of browsing around different communities and posts would be a more difficult change, but still important to not confuse users by sending them to other instances.

0xDEADCADE avatar Jun 07 '23 11:06 0xDEADCADE

You mean this issue?

Nutomic avatar Jun 07 '23 12:06 Nutomic

Agreed, an image proxying service would be nice to have.

dessalines avatar Jun 07 '23 19:06 dessalines

I am actually quite confused by this. I always thought proxying of media is not implemented by Lemmy (other than some small thumbnail previews), but it turns out that very inconsistently it does already proxy images.

Take this post for example: https://slrpnk.net/post/433623

(Edit: it seems to have changed in 0.18.0 ? Now this image is proxied by sh.just.works)

This is a post by a remote user, in a remote community, yet the image is clearly served from my domain.

While I understand that there are certain scaling advantages to this, it would be very much preferrable to be able to turn this media-proxy off or at least have an allow-list for it, as smaller instances might not be able to afford the storage space and bandwidth needed for it. In addition there is the legal consideration to not replicate images on your server due to copyright and other problems such as CSAM.

Edit: it seems like it only proxies images that are not on other Lemmy instances? Like in the above case I didn't notice that the original link was on an imgur host. That seems like a nice privacy feature, but kinda makes the reason why people upload images to external hosts (to not overload the Lemmy instance) a bit pointless.

poVoq avatar Jun 17 '23 12:06 poVoq

Take this post for example: https://slrpnk.net/post/433623

This is a post by a remote user, in a remote community, yet the image is clearly served from my domain.

When I view (Chrome desktop Linux browser) your community posting list, https://slrpnk.net/c/[email protected] - the thumbnail is served from the originating site - not from slrpnk.net or i.imgur.com -- the thumbnail URL is: https://sh.itjust.works/pictrs/image/0441bb00-4ac3-403e-b5ca-f4361e8f4ed4.jpeg?format=webp&thumbnail=256

RocketDerp avatar Jun 18 '23 03:06 RocketDerp

Yes, a media proxy is a really important feature!

Especially for privacy reasons and probably also for GDPR compliance. It can't be good if the UI is wildly connecting to random unknown servers.

Considering the IP address is personal information, this is actually a big issue because you (as the instance operator) have absolutely no idea with who you are sharing personal information from your users.

However, the thought of maybe using a own protocol like the one from matrix is probably way more complicated than it needs to be. Guess it's way more easier to look at mastodon, pleroma, calckey, and so on. They all have a media proxy. Most of them are a part of the caching feature (which is optional but, for bigger instances, rather useful).

ghost avatar Jun 20 '23 13:06 ghost

Related: https://github.com/LemmyNet/lemmy/issues/3474 https://github.com/LemmyNet/lemmy/issues/1036

ghost avatar Jul 05 '23 12:07 ghost

I also recently observed that while the home instance does proxy images from places like imgur for local users, it seems to sometimes send only the original URL via federation to other servers, which in turn try to load it directly from Imgur.

I also noted that the behaviour I described in a post further up, changed after updating to a later version Lemmy. Now the image does seem to be proxied by the remote instance?

Really the most confusing thing is that there seems to be 0 consistency and all it kinda randomly based on some unknown failure mode in Pictrs.

poVoq avatar Jul 05 '23 12:07 poVoq

A note here: pictrs plans to add media proxy support in version 5, so we should wait for that.

https://git.asonix.dog/asonix/pict-rs/issues/9

dessalines avatar Jul 18 '23 15:07 dessalines

A very simple first step we could take is to allow setting a media proxy url. If that is set this url is prepended to the media url. This way it would be easy to create a pull through proxy for images with nginx or whatever.

cperrin88 avatar Jul 21 '23 10:07 cperrin88

https://github.com/LemmyNet/lemmy/pull/4035

Nutomic avatar Jan 29 '24 13:01 Nutomic