Feature Request/Idea: implement `:RateLimitingDefaultCapacityTiers` in the new front-end
Overview of the Feature Request
We're seeing increased traffic from what are believed to be LLM scrapers. Current rate limiting logic applies to API endpoints only, but could similar logic help the new front-end protect from such attacks?
What kind of user is the feature intended for?
Sysadmin
What inspired the request?
A deluge of GET requests from a broad swath of client IPv4s, each client making some small (<20) number of requests before switching addresses.
What existing behavior do you want changed?
Database servers lit up like Christmas trees.
Any brand-new behavior do you want to add to Dataverse?
Configurable rate limiting in the new front-end, if its current architecture might make this possible in any way.
Any open or closed issues related to this feature request?
https://github.com/IQSS/dataverse/pull/10211
@landreev should be able to confirm, but I think the : RateLimitingDefaultCapacityTiers is actually at the Command layer. Which means it would limit both API calls and (current LSF) front end UI calls. At least for any activity that calls a command.
Correct, as currently implemented, the tier levels are configured for commands.
2 special dummy commands were added, CheckRateLimitForCollectionPageCommand and CheckRateLimitForDatasetPageCommand specifically to be able to rate-limit the 2 workhorse pages in the current, jsf UI. ("dummy" = the commands don't do anything; their sole purpose is to be called from the init() methods of the 2 pages, providing a way to rate-limit the pages).
For the purposes of the new front end, one way to go would be to use this same model - and create dedicated dummy API calls, that don't do anything but call the dummy commands that don't do anything... etc. and have the SPA start all the page sessions with the calls to the corresponding APIs.
But it also may be possible to achieve the same result by rate-limiting the actual workload commands like GetDataverseCommand and GetLatestPublishedDatasetVersionCommand, since, presumably, the SPA will be consistently calling the API methods utilizing these commands. (The existing jsf Dataverse page does not call any commands to initialize; the dataset page does, but in some non-consistent manner - hence we just added the dedicated dummy commands).
FWIW: We (QDR) tried rate limiting for unauthenticated users to stop this, but that causes normal unauthenticated users to be rate limited as well, so we went with blocking/throttling IPs/blocks elsewhere. I'm not sure how the current rate-limiting code can handle bad users without impacting good users if they're using the same calls.
Same here (at IQSS), we haven't been using it much because of how much of a nuclear option it is, with all the human guest users being the collateral damage. But it is something we still consider as a last resort/serious emergency option.