PasteHunter
PasteHunter copied to clipboard
[Feature] Add per input rate limit.
Sites like slexy have heavy rate limits. - https://github.com/kevthehermit/PasteHunter/issues/52
Add a configurable rate limit per input source
Hello @kevthehermit,
I'm the maintainer of Slexy.org. I opened a PR a few minutes ago to help remedy some Slexy Terms of Service violations. The rate limiting/throttling is the next step, but I am unable to implement this. Is this in the short-term project roadmap?
Lack of throttling + excessive HTTP -> HTTPS redirects + non-identifying UA have all led to a lot of folks being banned. Lack of throttling still causes temporary or permanent bans. e.g. https://github.com/kevthehermit/PasteHunter/issues/52
My PR: https://github.com/kevthehermit/PasteHunter/pull/94
Regards, @thephpjedi
FYI @thephpjedi this will likely be a significant lift due to the multithreading we're doing and require a large rewrite of a pretty core part of the application. I have been poking at it, but it will take a while.
If you wanted, you could make it a LOT easier by simply imposing a ratelimit and returning the remaining requests in a header, similar to how github handles things, which would avoid the need for us to track and lock the remaining ratelimit on our side, and would make this restriction more obvious to developers in the future. Not a necessity, but would make this happen a lot more easily if you enforced that limit on your side by simply not returning data. It'd also be less nuclear than banning, which would be nice (although you could reduce this value or ban serial abusers) Thread safety is a pain.
Hi @Plazmaz,
Thanks for the additional context.
Glad to provide rate limit info via headers, especially if it makes it easier to implement throttling on your end. I also plan to return HTTP status code 429 (Too Many Requests) when limit is exceeded in most cases, instead of dropping packets in all cases. Both would result in better visibility and communication to users and developers.
These items were recently added to my roadmap, but I'll prioritize them for sooner so you and others can benefit due to the interest received in recent weeks.
I'll comment here once this is implemented. Can't promise anything earlier than end of January, but ambitiously hope to get it done now in December.
I really appreciate it, that will make life a lot easier. Thanks!
Hi @Plazmaz,
This week I implemented rate limiting logic on Slexy.org, largely aligned with earlier comments. Full list of current rate limits and acceptable requests are in the Slexy.org Terms of Service.
This is subject to change while I fine-tune the logic, but as of Friday evening the relevant behavior is:
-
Max 30 requests per 30 seconds allowed.
-
Max 3 HTTP requests allowed per 60 seconds. (See PR https://github.com/kevthehermit/PasteHunter/pull/94.)
-
Any requests which exceed a rate limit will have HTTP status code 429 returned, with the specific reason provided in the response document.
-
All responses include
RateLimit-Limit
andRateLimit-Remaining
headers. For non-429 HTTPS responses, all-request values are used (RateLimit-Limit: 30
). For HTTP responses, HTTP-request values are used (RateLimit-Limit: 3
). Other types of requests will use the appropriate values.Currently, these headers provide values for the all-requests limit (30 reqs per 30 secs). This should be sufficient info for your purposes, but please note I may return different values for HTTP requests soon (e.g. HTTPS:
RateLimit-Limit: 30
,RateLimit-Remaining: 29
; HTTP:RateLimit-Limit 3
,RateLimit-Remaining 2
). -
Status 429 responses also include a
Retry-After
header. Currently, the value is always60
even if the cooldown period is shorter. This may change in the future, but waiting 60 seconds is more than enough and ideally you should not be hitting rate limits in the first place.
Let me know if you encounter any unexpected header values, because a few days ago I had some edge cases which resulted in undefined behavior (since fixed, but I may have other unidentified edge cases). Feel free to send feedback here or via contact form.
Hey, thanks for the update. I will attempt to take a whack at this as soon as I get some spare time. I really appreciate the effort you've put in to make this easier for us, and I'll update this issue with any challenges or strange behaviors I run into. Thanks!
@thephpjedi we should now be compliant with your ToS and ratelimit (as of 1.3.0).
Moving back on topic for this issue, the best approach will probably be with a base object for paste inputs. This is something I've wanted to do for a while (central objects for inputs, outputs, and postprocessing will remove a LOT of redundant logic)