architecture-center
architecture-center copied to clipboard
Load-Balancing API Requests
Looking at the diagram, I don't see how Azure API Management would load-balance across multiple Azure Open AI instances. Typically, I would also expect to see an internal Azure Application Gateway between Azure API Management and Azure Open AI instances in addition to the depicted external Azure Application Gateway.
Document Details
⚠ Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.
- ID: 59e64035-d0cb-d2cf-0848-4153b5cfe876
- Version Independent ID: 59e64035-d0cb-d2cf-0848-4153b5cfe876
- Content: Implement logging and monitoring for Azure OpenAI large language models - Azure Architecture Center
- Content Source: docs/ai-ml/openai/architecture/log-monitor-azure-openai.yml
- Service: architecture-center
- Sub-service: example-scenario
- GitHub Login: @jakeatmsft
- Microsoft Alias: jacwang
@simonkurtz-MSFT Thanks for your feedback! We will investigate and update as appropriate.
@simonkurtz-MSFT - We are leveraging the APIM policy to provide the load-balancing functionality for the backend Azure OpenAI resources. Random load balancing is sufficient for many use cases, additionally, you can utilize the APIM native cache to persist the state of each resource/model if you want to be more sophisticated in your request routing.