core Spike: Investigate rate limiting or other controls for AI APIs

Spike: Investigate rate limiting or other controls for AI APIs

Open john-thomas-dotcms opened this issue 1 year ago • 2 comments

We need to investigate and define what it would take to add rate limiting to the dotAI APIs. There are sure to be patterns we can use for this; we need to figure out which one(s) would make the most sense.

Rate limiting could be applied in different ways:

Overall (number of requests on the whole system).
- This is far from ideal, since it leaves the system open to a simple denial of service attack.
To individual users (via unique auth tokens).
To individual IPs.
- This might not make sense in most architectures - see Requirements, below.

Limits could be determined by different metrics:

Number of requests in a given time period.
Length of input and/or output.
Number of tokens (as defined by OpenAI).
- This might require a separate call to count the tokens before submitting the request.
Delay between subsequent requests.

Requirements:

Rate limiting must work when dotCMS is fronted by a CDN.
Rate limiting must work when all API requests go through a load balancer.

In an ideal implementation:

Any rate limits implemented should be configurable (via the App or config props/env vars).
The rate limiting method should be something we can re-use for other core dotCMS APIs later.

Feb 02 '24 16:02 john-thomas-dotcms

core core copied to clipboard

Spike: Investigate rate limiting or other controls for AI APIs

core
core copied to clipboard