code-dot-org icon indicating copy to clipboard operation
code-dot-org copied to clipboard

AI Chat: lower request limit per minute

Open bencodeorg opened this issue 7 months ago • 0 comments

We currently allow 50 requests per minute to AI Chat endpoint (this includes using OpenAI or Sagemaker). This seems a bit high considering what a legitimate user could plausibly do in a minute given wait times, time for typing, etc., so proposing lowering that limit a bit to protect our shared Active Job infrastructure.

Looking at data from our usage of OpenAI as a base model, p90 response time (ie, higher percentile = lower response time) between 2-3 seconds. That's without the time actually required for the user to write its message.

Sagemaker is (surprisingly) a bit faster than what we're seeing from OpenAI (maybe b/c our OpenAI usage supports images/PDFs?) -- I see p90 response times under a second there, so proposing a limit of 30/minute should be sufficient.

bencodeorg avatar May 06 '25 22:05 bencodeorg