AI Chat: lower request limit per minute
We currently allow 50 requests per minute to AI Chat endpoint (this includes using OpenAI or Sagemaker). This seems a bit high considering what a legitimate user could plausibly do in a minute given wait times, time for typing, etc., so proposing lowering that limit a bit to protect our shared Active Job infrastructure.
Looking at data from our usage of OpenAI as a base model, p90 response time (ie, higher percentile = lower response time) between 2-3 seconds. That's without the time actually required for the user to write its message.
Sagemaker is (surprisingly) a bit faster than what we're seeing from OpenAI (maybe b/c our OpenAI usage supports images/PDFs?) -- I see p90 response times under a second there, so proposing a limit of 30/minute should be sufficient.