Reconsider default Azure OpenAI rate limits
Today, when using Azure OpenAI with .NET Aspire, you get the following defaults on your deployment:
- SkuName: "Standard"
- SkuCapacity: 1
https://github.com/dotnet/aspire/blob/604f62f5c917f9a855b5188b1be78deac6234ad9/src/Aspire.Hosting.Azure.CognitiveServices/AzureOpenAIDeployment.cs#L14
capacity | integer | This represents the amount of quota you are assigning to this deployment. A value of 1 equals 1,000 Tokens per Minute (TPM). A value of 10 equals 10k Tokens per Minute (TPM).
(One problem is we are using default parameter values here, which hard-codes these values into the caller. Basically meaning we can't update these values in the future without people recompiling their code).
We should re-consider the default value of 1,000 tokens per minute. With this limit not even the simple playground app in the repo works out of the box:
https://github.com/dotnet/aspire/blob/604f62f5c917f9a855b5188b1be78deac6234ad9/playground/OpenAIEndToEnd/OpenAIEndToEnd.WebStory/Components/Pages/Home.razor#L31
Just running this prompt twice in a row (because of how Blazor Server works with prerendering) causes 429 rate limit errors.
One reason to use such a low limit is because deployment will succeed even if using a Free Trial subscripton: https://learn.microsoft.com/azure/ai-services/openai/quotas-limits#other-offer-types. However, suffering through rate limit errors for everyone may be worse than erroring for the low end subscriptions (free, student).
When I try a value too big, I get a pretty decent error:
Azure.RequestFailedException: The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '476657d0-1ec0-4746-9125-6b0c12cddabc'. See inner errors for details.
Status: 400 (Bad Request)
ErrorCode: InvalidTemplateDeployment
Content:
{"error":{"code":"InvalidTemplateDeployment","message":"The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '476657d0-1ec0-4746-9125-6b0c12cddabc'. See inner errors for details.","details":[{"code":"InsufficientQuota","message":"This operation require 10000 new capacity in quota Tokens Per Minute (thousands) - gpt-4o, which is bigger than the current available capacity 150. The current quota usage is 0 and the quota limit is 150 for quota Tokens Per Minute (thousands) - gpt-4o."}]}}
Giving a higher limit will make more scenarios work out of the box for more customers, while it may mean some customers experience an error during deployment - with a good error they can search for how to fix.
cc @mitchdenny @davidfowl @tg-msft
Another option would be if we didn't need to specify a default SkuCapacity in .NET Aspire at all, and we just leave it blank if the user doesn't provide one. And then Azure would set a default capacity that was appropriate for the given subscription/account by default.
@eerhardt this is surprising. When I created this playground sample it worked just fine out of the box. I'm wondering if the token limits is a new thing?
@eerhardt this is surprising. When I created this playground sample it worked just fine out of the box. I'm wondering if the token limits is a new thing?
Did you use an existing Azure OpenAI account? Or did you provision a new one using Aspire?
Provisioned a new one using Aspire.
Can you try it again? I don't know if something changed, but it didn't work out of the box for me.
Yeah, I can repro what you are seeing now. What is interesting is that I can see that on the initial render of the page we do get some content (and then it makes a second request - a bug but shouldn't be a big issue) and its that second and subsequent requests that fail.
I'm in favor of increasing the limit to help folks find the happy path.
@sebastienros / @eerhardt do you think this will make it for 8.2 or should we push back to backlog and prioritize with everything else?
@sebastienros / @eerhardt do you think this will make it for 8.2 or should we push back to backlog and prioritize with everything else?
We have a PR out for it - #5374. The goal is to get this into 8.2.