Does aibrix support to do load balance against managed model endpoints
For example, is it possible to use aibrix gateway to do load balance for azure openai endpoints so that we can take advantage of the gateway features like prefix cache load balance.
@Colstuwjx
A quick question, for manage endpoints, we do not have any control on the behavior. Do you make the assumption that Azure deployment will automatically cache it for you if aibrix gateway route to Azure openai endpoint? We'd love to hear more on this case. Thanks!
Hi @Jeffwan
I'm just wondering if we can try out part of features of the aibrix like gateway / autoscaling, without blocking by the engine runtime. For example, just use aibrix gateway as an individual component to resolve the gateway layer requirements like route requests by least conn or round-robin.
Some ai gateways support similar behaviors as well, like kong ai gateway supporting ai providers + local models, but as a platform, I'm not convinced at this moment that this is a necessary feature, we may need more feedbacks. Most of the time, this looks like a client feature, and AIBrix could behave as an alternative backend, but who knows. My two cents.