[Feature] Federate metadata
Requirements
- [X] Is this a feature request? For questions or discussions use https://lemmy.ml/c/lemmy_support
- [X] Did you check to see if this issue already exists?
- [X] Is this only a feature request? Do not put multiple feature requests in one issue.
- [X] Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.
- [X] Do you agree to follow the rules in our Code of Conduct?
Is your proposal related to a problem?
I always assumed my Lemmy instance was only making connections to other instances, but its constantly making GET requests to all kinds of non-Lemmy webservers for scraping.
I am not sure if Im too happy about that, for example Reddit may rate-limit my IP, other websites might block me for constant scraping and it might connect to all kinds of shady or illegal websites. Also, if the number of instances grows, that would trigger a DoS attack at the external URL by all those instances fetching the metadata at roughly the same time.
Could there be an option to disable metadata scraping for federated posts? I dont mind my server connecting to other Lemmy servers, I just want it to trust the received metadata (thumbnail url and summary) instead of crawling the whole web just to verify it. For local posts the metadata fetching is fine.
Describe the solution you'd like.
A setting to disable the fetching of metadata for federated (remote) posts, and trust those fields from the remote instance.
Describe alternatives you've considered.
N/A
Additional context
No response
I can't really figure out the root reason we do this at all. Here's the line where we convert from Page -> Post, and these are the potential fields that are updated in the metadata fetch.
I think Page already has the thumbnail_url, but the others we might be able to federate as attachments or something, if they aren't already?
cc @Nutomic
The thumbnail is stored which afaik is also used by other Activitypub platforms. Afaik other platforms including Mastodon dont federate link metadata in any way, but we could do it by adding some fields in Link.
One problem is that a malicious instance could put wrong info into the metadata. Also other platforms (KBin, Piefed etc) wouldnt send this metadata, so we would be forced to fetch it again (or specifically add a setting to disable metadata fetching).
I think this should be turned into a two-step process.
The first priority would be a setting to disable metadata fetching for federated posts. The only thing one would loose is the embed_description as it would be set to a blank value, which is fine for me.
This would already increase federating performance (and privacy!) by a great amount. Because as described in other issues, the reason why people lag behind can be of the slow metadata fetching.
The next step would be to a federate the value of embed_description so that you experience no loss of functionality, even when you turn it off. I can see very little security impact by any malicious instance, as you already trust 90% of the post metadata (like the thumbail url) anyway. So if you trust the thumbnail, why wouldn't you trust the description?
I am perfectly fine with just trusting the thumbnail (and leaving the description blank). So I would already be happy with a setting to disable fetching of the description until its federated.
This would already increase federating performance (and privacy!) by a great amount. Because as described in other issues, the reason why people lag behind can be of the slow metadata fetching.
This is already fixed by https://github.com/LemmyNet/lemmy/pull/4564. But federating the embed info makes sense.
Up...