architecture-center icon indicating copy to clipboard operation
architecture-center copied to clipboard

Load-Balancing API Requests

Open simonkurtz-MSFT opened this issue 2 years ago • 2 comments

Looking at the diagram, I don't see how Azure API Management would load-balance across multiple Azure Open AI instances. Typically, I would also expect to see an internal Azure Application Gateway between Azure API Management and Azure Open AI instances in addition to the depicted external Azure Application Gateway.


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

simonkurtz-MSFT avatar Sep 06 '23 18:09 simonkurtz-MSFT

@simonkurtz-MSFT Thanks for your feedback! We will investigate and update as appropriate.

ManoharLakkoju-MSFT avatar Sep 07 '23 04:09 ManoharLakkoju-MSFT

@simonkurtz-MSFT - We are leveraging the APIM policy to provide the load-balancing functionality for the backend Azure OpenAI resources. Random load balancing is sufficient for many use cases, additionally, you can utilize the APIM native cache to persist the state of each resource/model if you want to be more sophisticated in your request routing.

jakeatmsft avatar Sep 14 '23 15:09 jakeatmsft