Investigate adding Apache-level mechanism for rejecting aggressive robot crawling
This is a narrow case of the overall "rate limiting" umbrella issue. This would not attempt to throttle the overall traffic to the site, or to regulate the rate of requests from "normal" users (that would be better done within the application. This mechanism would be the first line of defense, for detecting obvious bot/scripted or otherwise automated crawling - for ex., repeated calls coming from the same ip plowing through the collection page facets without pausing between calls - before it even gets to the application.
This would be doing essentially what we periodically do with custom command line scripts in our production. But third party tools should be readily available for addressing this common problem.
2024/03/14
- Currently waiting for input from @stevenwinship
@cmbz I meant more along the lines of "waiting until we deploy 6.2 - that will include Steven's application-side rate limiting solution - and experiment with it to see if that addresses the problem at hand, thus making an Apache-level solution unnecessary".
2024/03/27
- Status currently waiting, we will see how the updates made in 6.2 will affect performance.
2024/08/15
- Assigning to @landreev and placing on hold. We will review again at the next monthly meeting. @landreev will let us know when the work should move forward.