Allow bypassing proxy for some domains with ProxyAllImages
This PR introduces a new settings parameter, proxy_bypass_domains, which allows admins to configure some domains for which image proxying will be skipped.
The skipping logic is within the image_proxy/ endpoint itself, so URLs do not have to change whenever the bypass list is modified by admins. If any domain is on this bypass list, then instead of trying to proxy the image through pict-rs, Lemmy will just issue a HTTP redirect to the source image.
There are a couple of reasons such configuration can be useful. For example, imgur will very quickly start blocking all requests from Lemmy when running with ProxyAllImages, due do its rate limits. Additionally, a few Lemmy instances provide their own external image hosting service, this setting will allow them to not duplicate the same images in two places.
I would consider bypassing some domains by default, such as imgur you mentioned. Because most admins wont know which ones can break. Though it should still be possible to remove the default values. If thats too complicated, its also good to list problematic domains somewhere in documentation.
There should be a change to how requests for proxying are done:
- Retry requests when a rate limit duration is returned
- In a horizontally scaled setup, if the current process is being affected by a rate limit that will end later than for another process that has a different IP address, then make that other process do the request
- Maybe allow a rate limiting domain to temporarily bypass the proxy if it has enough reputation, which can be determined by how many existing posts link to the domain
- Only count local posts, since other instances could easily exceed the limit if they are malicious or have a weaker post creation rate limit
- I think it should be allowed for domains that are used in at least 100 posts and at least 5% of all posts (the 100 posts requirement makes a difference if there's less than 2000 posts)