[Feed Issue]: Blocked from openrss.org
Feed URL
https://openrss.org/rss
Website URL
https://openrss.org
Problem Description
All subscribed feeds from OpenRSS return a 403 forbidden error, and viewing the openrss.org website from the same IP address shows a "Blocked" message.
Expected Behavior
OpenRSS has an open issue from 12/14/2024, saying "Some users of the miniflux application are blocked due to the application requesting too many feeds at the same time." - https://openrss.org/issue/174
Rate limiting requests to openrss.org, and perhaps a per-domain request rate limit in general, would help avoid these blocks.
Relevant Logs or Error Output
Additional Context
No response
Troubleshooting Steps
- [x] I have checked if the feed URL is correct and accessible in a web browser.
- [x] I have checked if the feed URL is correct and accessible with
curl. - [ ] I have verified that the feed is valid using an RSS/Atom validator.
- [x] I have searched for existing issues to avoid duplicates.
The correct approach is to respond with a 429 Too Many Requests status instead of blocking Miniflux users.
Miniflux supports 429 responses along with the Retry-After header. Additionally, it supports the use of ETag, Last-Modified, Cache-Control: max-age, Expires headers, and the RSS <ttl> field to prevent unnecessary requests.
Manual refreshes from the web UI are limited to a 30-minute interval by default.
Even with these safeguards in place, users can still set up Miniflux however they want.
Looking at their issue tracker (https://openrss.org/issues) and this page (https://openrss.org/rss-feed-readers), they seem to block an impressive number of feed readers.
they already do talk about responding with 429s at https://openrss.org/guides/developers-guide-to-open-rss-feeds#hold-off-when-you-get-a-429
open rss isnt the only site that blocks my miniflux instance and I haven't changed any defaults btw. all these websites can't be wrong, so miniflux does have some issues with the requests.
rachelbythebay had a similar set of blogs about all of these readers that are causing issues with websites. i can appreciate them raising the issues so developers can resolve them
Miniflux supports
429responses along with theRetry-Afterheader. Additionally, it supports the use ofETag,Last-Modified,Cache-Control: max-age,Expiresheaders, and the RSS<ttl>field to prevent unnecessary requests.Manual refreshes from the web UI are limited to a 30-minute interval by default.
Are these done on a per-feed basis, or per-domain?
👋
Thanks for creating this issue
Aside, anyone got a short-term solution for this (e.g. openrss.org alt) - would still like to create/read feeds? 😣
🤞
@trekzavier on ticket three three eight two you mention this issue and that miniflux doesn't "store and reuse content locally", what do you mean by that? What content is there to be reused when fetching a feed? Are you talking about the requests made by [ ] Fetch original content? Is there an issue related to that specifically?
I only ask because #3289 / openrss#174 are pretty clearly not about "reusing content" but rather about the way miniflux schedules parallel fetches to an individual site...
Quoth the url listed under Expected Behavior:
This is caused by the application failing to space out its requests.
Miniflux supports
429responses along with theRetry-Afterheader. Additionally, it supports the use ofETag,Last-Modified,Cache-Control: max-age,Expiresheaders, and the RSS<ttl>field to prevent unnecessary requests. Manual refreshes from the web UI are limited to a 30-minute interval by default.Are these done on a per-feed basis, or per-domain?
This per feed URL.
I think the issue OpenRSS is raising is that Miniflux makes several HTTP requests in parallel when refreshing feeds. By default, it uses 16 background threads (see the WORKER_POOL_SIZE config option).
If one or more Miniflux users subscribe to multiple OpenRSS feeds, OpenRSS might see multiple parallel HTTP requests from Miniflux.
Spacing out the requests, as suggested by OpenRSS, doesn't fully solve the problem because a Miniflux user can force a refresh from the web UI or the API. Additionally, refreshing feeds sequentially could take a very long time on a large Miniflux instance with many users.
By default, Miniflux schedules feed refreshes in a round-robin fashion, like many other RSS readers, nothing fancy. Implementing a smarter scheduler might help avoid refreshing feeds from the same domain at the same time, but that doesn't address the issue mentioned above.
I'm not sure what specific issue they're facing on their end or whether they have any caching in place. From my perspective, implementing rate limiting on their side would be the ideal solution. Otherwise, anyone could easily DDoS their platform, regardless of how RSS readers behave.
Quoth openrss "space out your requests:
...space out your requests, such that each request is initiated at least 1-2 seconds after the previous request as been kicked off
Maybe a delay based on the sha1 of the path in the feed url would be enough spread the requests without building a whole new scheduler?
max_sleep = 2; # needs tweaking, i bet
sleep_rank=sha1(feed_url.path).last_byte / 255;
usleep max_sleep * sleep_rank
Personally:
- I'm on a single user instance
- I wouldn't notice an extra second or two on a single feed refresh
- I wouldn't notice a minute on the tail end of a "refresh all", since I mostly press it when my install of miniflux "feels frozen" and that fear disappears when the first new item appears
Miniflux makes several HTTP requests in parallel when refreshing feeds. By default, it uses 16 background threads (see the WORKER_POOL_SIZE config option).
16 requests at the same time on the same domain is surely to get a 429 or blocked by most sites. it is my understanding that people can change this but there should be more site-respecting default.
Spacing out the requests, as suggested by OpenRSS, doesn't fully solve the problem because a Miniflux user can force a refresh from the web UI or the API.
well this is a problem, right? i can see user force requesting. but i misunderstand why when this is clicked, the app would still go out and get a response from the website when the site's previous response has clearly said to serve them your stored cached response until max-age expire like browsers do?
Additionally, refreshing feeds sequentially could take a very long time on a large Miniflux instance with many users.
i mention before this has been addressed. If requests are not made in round-robin, but spaced out and subsequent requests use stored responses until the max-age of the previous request expires, then a user wouldn't have to wait. In other words, all of their recommendations need to be implemented in order for this to work properly. you can not just address 1 or 2 issue then complain about the big problem not fully being solved
@guest20 @trekzavier
Hi, if you need rate limits, feel free to try my fork https://github.com/dsh2dsh/miniflux where I implemented rate limits and some more features.
@dsh2dsh Any reason your rate limits can't be put into a PR? Sounds like it would be useful for folks running "upstream"
@guest20
Any reason your rate limits can't be put into a PR?
No reason, just not my goal. Anybody feel free to backport it.
@dsh2dsh I'm just a perl programmertehe, but I think you're talking about https://github.com/miniflux/v2/commit/1f14fbeb0cc22fafb1777fd3ad99498ddd790acd Rate limit connections per server.
It doesn't seem like it handles back-pressure headers at the heart of this issue, but it does look like setting RATE_LIMIT_PER_SERVER=1 would be enough to convince openrss to not banninate your instance since that's what they want - a single connection at most once a second
@guest20
Correct. RATE_LIMIT_PER_SERVER=1 is one connection per server per second.
@fguillot Do you think git cherry-pick 1f14fbeb0cc22fafb1777fd3ad99498ddd790acd from @dsh2dsh's branch is doable?
I think a blunt tool like RATE_LIMIT_PER_SERVER=1 is, at worst a good start, and at best just a whole solution for small instances dealing with "too many concurrent requests"
A more advanced RATE_LIMIT_CONCURRENT[youtube.com]=15-type mapping thing might be nice too‡.
__ ‡. I'm not sure how go-enjoyers would put a mapping like that in the environment
Miniflux 2.2.12 introduces a new configuration option, POLLING_LIMIT_PER_HOST, which should address the main request raised in the original issue. It may not cover every use case, but if you need something more advanced, feel free to open a pull request.
As for OpenRSS.org, it's their decision whether or not to unblock Miniflux.
- #3644
Thank you but POLLING_LIMIT_PER_HOST isn't on by default and it forces me to either enable it for every site or none at all so this feature is basically useless. why not just update miniflux to poll a feed after the max-age in cache control headers expires like other readers? then the problem will be solved and there would be no need for this extra step.
@trekzavier TL;DR: If implemented your suggestion wouldn't solve the problem, and it is already implemented.
This ticket is about openRSS's rule that you can't start multiple concurrent requests. Scheduling requests only based on max-age won't address that because the scheduler runs every X minutes, so a bunch of feeds can still all exceed their max-age between two runs of the scheduler and that would lead to them all being scheduled, and fetched at exactly the same time... which is the thing the openRSS is blocking miniflux for.
The upstream documentation, and their suggested solution is both linked and pasted directly into this github issue
The author of the project, in the first comment on this issue, literally says
Miniflux supports 429 responses along with the Retry-After header. Additionally, it supports the use of ETag, Last-Modified, Cache-Control: max-age, Expires headers, and the RSS
field to prevent unnecessary requests.
This ticket is about openRSS's rule that you can't start multiple concurrent requests.
well no, it's not. The ticket is about miniflux being blocked from openrss because it is "requesting too many feeds at the same time."
just because someone says that something is implemented doesn't mean that it is. i'm a miniflux user and i tested what the application is doing when a feed is added and max-age from any site is not respected. max-age will address the issue because the max-age is different for each feed. i believe this is true for openrss as well. if miniflux waits until the max-age for a feed expires before making another request, it will never hit a 429, even if a user has subscribed to a bunch of openrss feeds.
if miniflux is running on a scheduler and can't schedule feeds based on max-age, that is exactly the problem that should be fixed. its kind of crazy for an app to just blindly be making a ton of round-robin requests to every feed subscribed and just ignore when a website says to slow down... and we wonder why websites are always blocking feeds. this is why. If a website is making cache headers like max-age available, they should be used and it should be taken into account in the scheduler.
I wouldnt even mind the POLLING_LIMIT_PER_HOST as a workaround, but it's not useful if you cant even restrict the feature to just a few websites. the way it is implemented now means that you must enable it for every website or none at all.
@trekzavier Just got back from vacation and catching up on my github notifications and I'd like to ask ... "what?" ... how is "you can't start multiple concurrent requests" different from "requesting too many feeds at the same time"?
And what do you mean by "just because someone says that something is implemented doesn't mean that it is"? miniflux is open source... you can see where max-age is used to set the next feed fetch:
https://github.com/miniflux/v2/blob/e8f5c2446c9acfb89f0bf67176ce4c32e3ca9618/internal/reader/handler/handler.go#L321
POLLING_LIMIT_PER_HOST as a workaround, ... enable it for every website or none at all.
So if you set it to 1, you have to wait 59 more seconds to see a new story from the 60th expired feed on an origin, if they all expire at the same time. That's going to be a much smaller delay than the one you'd see when the site bans you for making 60 parallel requests.