Documentation is not clearly defined on how to set the RateLimiting and how to measure the token consumption and how to enable the authentication for different users
🐛 Describe the bug
https://aibrix.readthedocs.io/latest/features/gateway-plugins.html# The above documentation only talks about the feature not how to do it
Steps to Reproduce
https://aibrix.readthedocs.io/latest/features/gateway-plugins.html#
Expected behavior
Proper documentation on how to enable the Ratelimiting How to control the model access using what endpoints set these feature to be clearly called
Environment
All AIbrix
To unblock you I am adding the details here, will add the document.
-
We have a separate metadata service, so it needs separate port forwarding. WIP to add under gateway umbrella which will remove separate port forwarding.
kubectl -n aibrix-system port-forward svc/aibrix-metadata-service 8090:8090 & -
Create a user, and specify RPM and TPM config. For more details please check out pkg/metadata/README.md
curl http://localhost:8090/CreateUser \
-H "Content-Type: application/json" \
-d '{"name": "your-user-name","rpm": 100,"tpm": 1000}'
- Inference request: same as quick start, only need to add header for user.
curl -v http://localhost:8888/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer any_key" \
-H "user: your-user-name" \
-d '{
"model": "llama2-7b",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
- For user config, we are also looking for more feedback from community, how companies generally do user and quota management.
In the above create user how the the Authentication key managed. How to assign each user with authentication key.
Also How we will measure user usage metrics like Token conumption agains the each user created and its authentication key
In the above create user how the the Authentication key managed. How to assign each user with authentication key.
Authentication key present in the request is for the model, and not associated with user. Right now create user is pretty naive implementation and needs work to formalize it.
Also How we will measure user usage metrics like Token conumption agains the each user created and its authentication key
Gateway internally tracks token consumption in one minute window and does the validation to prevent user from over-consuming tokens or requests in one min window. Currently, token or request usage for each user is not exposed in metrics.
@varungup90 Thanks. I think adding this feature will be the benefit of this gateway feature of mapping authentication for model with respect each user or group of users right and associated token consumption exposing via usage metrics.
With respect to current implementation. How will I configure authentication with respect to model. that is also missing in the documentaion.
Could you please help me to understand how can a create user with authentication key mapped to model (as per your current feature) and associate that user with RPM & TPM limitations.
Thanks
@vivekrsintc Right now features you have listed are not present. Before implementation, I need more understanding of the requirement. I could not find on aibrix slack channel. Can you ping me there.