aspire Reconsider default Azure OpenAI rate limits

Today, when using Azure OpenAI with .NET Aspire, you get the following defaults on your deployment:

SkuName: "Standard"
SkuCapacity: 1

https://github.com/dotnet/aspire/blob/604f62f5c917f9a855b5188b1be78deac6234ad9/src/Aspire.Hosting.Azure.CognitiveServices/AzureOpenAIDeployment.cs#L14

capacity | integer | This represents the amount of quota you are assigning to this deployment. A value of 1 equals 1,000 Tokens per Minute (TPM). A value of 10 equals 10k Tokens per Minute (TPM).

(One problem is we are using default parameter values here, which hard-codes these values into the caller. Basically meaning we can't update these values in the future without people recompiling their code).

We should re-consider the default value of 1,000 tokens per minute. With this limit not even the simple playground app in the repo works out of the box:

https://github.com/dotnet/aspire/blob/604f62f5c917f9a855b5188b1be78deac6234ad9/playground/OpenAIEndToEnd/OpenAIEndToEnd.WebStory/Components/Pages/Home.razor#L31

Just running this prompt twice in a row (because of how Blazor Server works with prerendering) causes 429 rate limit errors.

One reason to use such a low limit is because deployment will succeed even if using a Free Trial subscripton: https://learn.microsoft.com/azure/ai-services/openai/quotas-limits#other-offer-types. However, suffering through rate limit errors for everyone may be worse than erroring for the low end subscriptions (free, student).

When I try a value too big, I get a pretty decent error:

Azure.RequestFailedException: The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '476657d0-1ec0-4746-9125-6b0c12cddabc'. See inner errors for details.
Status: 400 (Bad Request)
ErrorCode: InvalidTemplateDeployment

Content:
{"error":{"code":"InvalidTemplateDeployment","message":"The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '476657d0-1ec0-4746-9125-6b0c12cddabc'. See inner errors for details.","details":[{"code":"InsufficientQuota","message":"This operation require 10000 new capacity in quota Tokens Per Minute (thousands) - gpt-4o, which is bigger than the current available capacity 150. The current quota usage is 0 and the quota limit is 150 for quota Tokens Per Minute (thousands) - gpt-4o."}]}}

Giving a higher limit will make more scenarios work out of the box for more customers, while it may mean some customers experience an error during deployment - with a good error they can search for how to fix.

cc @mitchdenny @davidfowl @tg-msft

Jul 18 '24 19:07 eerhardt

Another option would be if we didn't need to specify a default SkuCapacity in .NET Aspire at all, and we just leave it blank if the user doesn't provide one. And then Azure would set a default capacity that was appropriate for the given subscription/account by default.

Jul 18 '24 19:07 eerhardt

@eerhardt this is surprising. When I created this playground sample it worked just fine out of the box. I'm wondering if the token limits is a new thing?

Jul 21 '24 11:07 mitchdenny

@eerhardt this is surprising. When I created this playground sample it worked just fine out of the box. I'm wondering if the token limits is a new thing?

Did you use an existing Azure OpenAI account? Or did you provision a new one using Aspire?

Jul 22 '24 00:07 eerhardt

Provisioned a new one using Aspire.

Aug 07 '24 09:08 mitchdenny

Can you try it again? I don't know if something changed, but it didn't work out of the box for me.

Aug 07 '24 15:08 eerhardt

Yeah, I can repro what you are seeing now. What is interesting is that I can see that on the initial render of the page we do get some content (and then it makes a second request - a bug but shouldn't be a big issue) and its that second and subsequent requests that fail.

Aug 12 '24 05:08 mitchdenny

I'm in favor of increasing the limit to help folks find the happy path.

Aug 12 '24 05:08 mitchdenny

@sebastienros / @eerhardt do you think this will make it for 8.2 or should we push back to backlog and prioritize with everything else?

Aug 21 '24 05:08 mitchdenny

@sebastienros / @eerhardt do you think this will make it for 8.2 or should we push back to backlog and prioritize with everything else?

We have a PR out for it - #5374. The goal is to get this into 8.2.

Aug 21 '24 20:08 eerhardt