dataverse
dataverse copied to clipboard
Investigate adding Apache-level mechanism for rejecting aggressive robot crawling
This is a narrow case of the overall "rate limiting" umbrella issue. This would not attempt to throttle the overall traffic to the site, or to regulate the rate of requests from "normal" users (that would be better done within the application. This mechanism would be the first line of defense, for detecting obvious bot/scripted or otherwise automated crawling - for ex., repeated calls coming from the same ip plowing through the collection page facets without pausing between calls - before it even gets to the application.
This would be doing essentially what we periodically do with custom command line scripts in our production. But third party tools should be readily available for addressing this common problem.