core
core copied to clipboard
Spike: Investigate rate limiting or other controls for AI APIs
We need to investigate and define what it would take to add rate limiting to the dotAI APIs. There are sure to be patterns we can use for this; we need to figure out which one(s) would make the most sense.
Rate limiting could be applied in different ways:
- Overall (number of requests on the whole system).
- This is far from ideal, since it leaves the system open to a simple denial of service attack.
- To individual users (via unique auth tokens).
- To individual IPs.
- This might not make sense in most architectures - see Requirements, below.
Limits could be determined by different metrics:
- Number of requests in a given time period.
- Length of input and/or output.
- Number of tokens (as defined by OpenAI).
- This might require a separate call to count the tokens before submitting the request.
- Delay between subsequent requests.
Requirements:
- Rate limiting must work when dotCMS is fronted by a CDN.
- Rate limiting must work when all API requests go through a load balancer.
In an ideal implementation:
- Any rate limits implemented should be configurable (via the App or config props/env vars).
- The rate limiting method should be something we can re-use for other core dotCMS APIs later.