aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Does aibrix support to do load balance against managed model endpoints

Open Colstuwjx opened this issue 9 months ago • 3 comments

For example, is it possible to use aibrix gateway to do load balance for azure openai endpoints so that we can take advantage of the gateway features like prefix cache load balance.

Colstuwjx avatar Mar 03 '25 03:03 Colstuwjx

@Colstuwjx

A quick question, for manage endpoints, we do not have any control on the behavior. Do you make the assumption that Azure deployment will automatically cache it for you if aibrix gateway route to Azure openai endpoint? We'd love to hear more on this case. Thanks!

Jeffwan avatar Mar 03 '25 07:03 Jeffwan

Hi @Jeffwan

I'm just wondering if we can try out part of features of the aibrix like gateway / autoscaling, without blocking by the engine runtime. For example, just use aibrix gateway as an individual component to resolve the gateway layer requirements like route requests by least conn or round-robin.

Colstuwjx avatar Mar 03 '25 08:03 Colstuwjx

Some ai gateways support similar behaviors as well, like kong ai gateway supporting ai providers + local models, but as a platform, I'm not convinced at this moment that this is a necessary feature, we may need more feedbacks. Most of the time, this looks like a client feature, and AIBrix could behave as an alternative backend, but who knows. My two cents.

kerthcet avatar Mar 03 '25 10:03 kerthcet