architecture-center Load-Balancing API Requests

Load-Balancing API Requests

Open simonkurtz-MSFT opened this issue 2 years ago • 2 comments

Looking at the diagram, I don't see how Azure API Management would load-balance across multiple Azure Open AI instances. Typically, I would also expect to see an internal Azure Application Gateway between Azure API Management and Azure Open AI instances in addition to the depicted external Azure Application Gateway.

Document Details

⚠ Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

ID: 59e64035-d0cb-d2cf-0848-4153b5cfe876
Version Independent ID: 59e64035-d0cb-d2cf-0848-4153b5cfe876
Content: Implement logging and monitoring for Azure OpenAI large language models - Azure Architecture Center
Content Source: docs/ai-ml/openai/architecture/log-monitor-azure-openai.yml
Service: architecture-center
Sub-service: example-scenario
GitHub Login: @jakeatmsft
Microsoft Alias: jacwang

Sep 06 '23 18:09 simonkurtz-MSFT

@simonkurtz-MSFT Thanks for your feedback! We will investigate and update as appropriate.

Sep 07 '23 04:09 ManoharLakkoju-MSFT

@simonkurtz-MSFT - We are leveraging the APIM policy to provide the load-balancing functionality for the backend Azure OpenAI resources. Random load balancing is sufficient for many use cases, additionally, you can utilize the APIM native cache to persist the state of each resource/model if you want to be more sophisticated in your request routing.

Sep 14 '23 15:09 jakeatmsft

architecture-center architecture-center copied to clipboard

Load-Balancing API Requests

Document Details

architecture-center
architecture-center copied to clipboard