core icon indicating copy to clipboard operation
core copied to clipboard

Spike: Investigate rate limiting or other controls for AI APIs

Open john-thomas-dotcms opened this issue 1 year ago • 2 comments

We need to investigate and define what it would take to add rate limiting to the dotAI APIs. There are sure to be patterns we can use for this; we need to figure out which one(s) would make the most sense.

Rate limiting could be applied in different ways:

  • Overall (number of requests on the whole system).
    • This is far from ideal, since it leaves the system open to a simple denial of service attack.
  • To individual users (via unique auth tokens).
  • To individual IPs.
    • This might not make sense in most architectures - see Requirements, below.

Limits could be determined by different metrics:

  • Number of requests in a given time period.
  • Length of input and/or output.
  • Number of tokens (as defined by OpenAI).
    • This might require a separate call to count the tokens before submitting the request.
  • Delay between subsequent requests.

Requirements:

  • Rate limiting must work when dotCMS is fronted by a CDN.
  • Rate limiting must work when all API requests go through a load balancer.

In an ideal implementation:

  • Any rate limits implemented should be configurable (via the App or config props/env vars).
  • The rate limiting method should be something we can re-use for other core dotCMS APIs later.

john-thomas-dotcms avatar Feb 02 '24 16:02 john-thomas-dotcms