uaa
uaa copied to clipboard
feat: internal rate limiting for UAA
Implements a rate limiting feature directly into the UAA.
As a first remark: When nothing is done regarding configuration of this feature, the whole feature will stay disabled (only the request filter, that in this case directly forwards the request, will be used in the container).
However if configuration is done, this allows to rate limit the requests processed by the uaa. It is possible to configure rate limits for specific endpoints as well as a global limit for all requests. The limits can be targeted at either all users of a specific endpoint or targeted to specific users of the endpoint (as determined by e.g. the email from the used JWT).
The PR nearly contains only new coding, only the HomeController was extended with a new 429 error page and the web.xml was extended with the entry point (Filter) of the rate limiting. All the limiting is done in new classes.
The PR also includes some documentation on how the feature works internally and how it has to be confgured. Also Example configuration is included. A PR to allow this configuration to be done via uaa-release will follow once we have this in the UAA.
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/182914786
The labels on this github issue will be updated when the story is started.
since this is new I would like to have at least 80% https://sonarcloud.io/summary/new_code?id=cloudfoundry-identity-parent&pullRequest=1987
some classes, e.g. configuration and RateLimitingFilter are with 0 coverage, please add a test class for the filter class
Added some more unit tests. Line coverage is now above 80% at 81.8% Condition coverage is still only at 62.5%, but then Sonar also want the conditions in Lombok-generated equals and hash code methods to be covered: (s. example) Also the only classes that have larger numbers of uncovered lines are some config classes, that only instantiate other classes.
So in my opinion there are not really meaningful tests left here. If you insist I could of course increase those numbers further by writing those tests, but in my opinion this would add no value.
@torsten-sap FYI because you were not in the meeting today
Hello - Any updates on this PR? I see that most of the recent commits are about adding tests. Is there any progress on using a standard library as @strehle suggested?
@peterhaochen47 The last commit were about the other feedback that @strehle provided, so adopting the configuration and having an IT and better coverage as requested by Sonar - so the PR fulfills all prerequisites for getting merged if this is wanted. If you prefer the solution using some other library, it probably does not make sense to implement this based on the changes in this PR. But this is then a more general discussion of what is wanted, I guess..
from code it look now good, https://sonarcloud.io/summary/new_code?id=cloudfoundry-identity-parent&pullRequest=1987 I have debugged the IT so from usage point of view I only miss a possiblity in uaa-release to configure this for bosh deployment
I spent some time looking in the HTTP 429 return code (Too many requests). This HTTP code appears to be for individual rate limiting only, not the kind we are trying to write here. AFAICT, HTTP 429 means "YOU are sending too many requests, slow down", NOT "other users are taking the whole available bandwidth, hence we are rejecting your request". See also this page about HTTP 429. It seems to me like in this case what we want to return is an HTTP 503 (Service unavailable). The linked documentation states:
Common causes are a server that is down for maintenance or that is overloaded. This response should be used for temporary conditions and the Retry-After HTTP header should, if possible, contain the estimated time for the recovery of the service.
And this makes sense to me. HTTP 4xx codes are for user errors. It's not the user's fault if they can't get a response when others overwhelm the server.
So, to summarize. My current understanding is that you should return 429 (Too many requests) for an rate limit per user, but 503 (Service unavailable) for any other rate limit.
It really looks to me like we should use the 503 code, together with the Retry-After
header to indicate to the client how long they should wait before retrying.
By the way, even the 429 code accepts the Retry-After
header. It should IMO be used for both codes.
Bruce Ricard,
I'm the contractor who wrote the majority of the Rate Limiting code FOR SAP... I am no longer under contract with SAP, and am currently working full time on another project.
I do NOT speak for SAP, and the opinions below are Mine and Mine Alone:
1st the simple and short question/suggestion:
re 429 and 503 status codes -- I think that is a great idea!
2nd re canned/existing library vs all this one-off code:
When the decision was made to add code to the UAA (over multiple other proposals) there was a primary goal:
The rate limiting should default to disabled AND impact as little other code in the UAA as possible!
The default to disabled is there, and IMO, the impact is basically limited to the addition of a new leading filter.
In addition, when it is disabled, the filter processing consists of only one additional test: for limiter status!
Some of the subsequent goals were:
- The Rate Limiting rules could be updated dynamically (via http request).
- Default (local file) Rate Limiting rules should be supported.
- If the Rate Limiting rules configuration "files" (either from a web server or the local file system) are corrupted (can't be fetched or don't parse error free) then use the existing configuration.
- If dynamic updating is active, but there is no usable existing configuration, the library goes into a semi-disabled state, while attempting to fetch a good configuration.
While it is possible that there is an existing library out there that could meet at least the most important goals, and while I can't speak to how much effort was put into a search by the SAP staff, my co-contractor and I did spend about a week looking at options and coming up with proposals. The closest we got to NOT writing code was to use NGENX as a sort of side-car, and this was thoroughly investigated and POC deployed, but eventually abandoned due to our believing that we could NOT extract caller identity from the JOT (and did not know how it was going to easily support defaulting to disabled and supporting dynamic updating of the rules).
George Smith
On Fri, Sep 16, 2022 at 7:03 AM Bruce Ricard @.***> wrote:
I spent some time looking in the HTTP 429 return code (Too many requests) https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429. This HTTP code appears to be for individual rate limiting only, not the kind we are trying to write here. AFAICT, HTTP 429 means "YOU are sending too many requests, slow down", NOT "other users are taking the whole available bandwidth, hence we are rejecting your request". It seems to me like in this case what we want to return is an HTTP 503 (Service unavailable) https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503.
And this makes sense to me. HTTP 4xx codes are for user errors. It's not the user's fault if they can't get a response when others overwhelm the server.
So, to summarize. My current understanding is that you should return 429 (Too many requests) for an rate limit per user, but 503 (Service unavailable) for any other rate limit.
— Reply to this email directly, view it on GitHub https://github.com/cloudfoundry/uaa/pull/1987#issuecomment-1249404780, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABO2UAKVWMTPJFUU4KB6NCLV6R42FANCNFSM55VX3DKQ . You are receiving this because you were mentioned.Message ID: @.***>
-- "And the users exclaimed with a laugh and a taunt: It's just what we asked for but not what we want." -- Unknown
In regards to 503 vs 429:
503 Unavailable is intended for all users (Downtime for XYZ).
429 Is intended for users and machines that request too many threads (Requests).
https://webmasters.stackexchange.com/questions/65674/should-i-return-a-429-or-503-status-code-to-a-bot
@litesoft Thank you for providing contexts, which explain a lot.
There are a bunch of WIP commits from July 14th, around this commit WIP for Testing deployment config file!.
We would strongly prefer not pushing WIP commits to the main branch. Could you maybe squash these WIP commits with the next commit, which fixed the issues?
@tack-sap @strehle, before you merge, could you remove the original doc on this PR? Then cherry-pick this new version of doc: https://github.com/cloudfoundry/uaa/blob/ratelimit-docs/docs/UAA-Rate-Limiting.md to this PR? In that process, please review the new doc and make any necessary edits.
Peter Chen,
Nice update to the UAA-Rate-Limiting.md file!
George
On Wed, Oct 19, 2022 at 9:27 AM Peter Chen @.***> wrote:
@tack-sap https://github.com/tack-sap @strehle https://github.com/strehle, before you merge, could you review this new version of doc: https://github.com/cloudfoundry/uaa/blob/ratelimit-docs/docs/UAA-Rate-Limiting.md & cherry-pick it to this PR? And remove the original doc on this PR?
— Reply to this email directly, view it on GitHub https://github.com/cloudfoundry/uaa/pull/1987#issuecomment-1284276688, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABO2UAO4K65X3T7XGBG2YRLWEAOPRANCNFSM55VX3DKQ . You are receiving this because you were mentioned.Message ID: @.***>
-- "And the users exclaimed with a laugh and a taunt: It's just what we asked for but not what we want." -- Unknown
@peterhaochen47 We agreed in the sync today that we can keep it parallel, especially as the feature will be tagged as experimental in the first release:
Parallelize doc from implementation for 2 PRs.
That works. I'll make a separate PR about the docs.