azure-sdk-for-net
azure-sdk-for-net copied to clipboard
[BUG] Azure.AI.OpenAI: Exceeding rate limits results in a retry which ends with 401 Unauthorized
Library name and version
Azure.AI.OpenAI 2.0.0-beta.2
Describe the bug
We have a rather big prompt, and a small rate limit of 1000 tokens/minute. Due to that combination we can invoke the OpenAPI endpoint only once every minute.
If we exceed that we get back such an exception:
"message": "Service request failed.\nStatus: 401 (Unauthorized)\n",
"stackTrace": " at Azure.AI.OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options)\n at Azure.AI.OpenAI.Chat.AzureChatClient.CompleteChatAsync(BinaryContent content, RequestOptions options)\n at OpenAI.Chat.ChatClient.<>c__DisplayClass8_0.<<CompleteChatStreamingAsync>g__getResultAsync|0>d.MoveNext()\n--- End of stack trace from previous location ---\n at OpenAI.Chat.AsyncStreamingChatCompletionUpdateCollection.AsyncStreamingChatUpdateEnumerator.CreateEventEnumeratorAsync()\n at OpenAI.Chat.AsyncStreamingChatCompletionUpdateCollection.AsyncStreamingChatUpdateEnumerator.System.Collections.Generic.IAsyncEnumerator<OpenAI.Chat.StreamingChatCompletionUpdate>.MoveNextAsync()\n at Sofia.Common.DigitalAssistantModule.Clients.OpenAi.OpenAiClient.GetChatCompletionsStream(String text, Language language, OpenApiUseCaseOption options, CancellationToken cancellationToken, String callerMemberName)+MoveNext() in /src/Sofia.Common.DigitalAssistantModule/Clients/OpenAi/OpenAiClient.cs:line 125\n at Sofia.Common.DigitalAssistantModule.Clients.OpenAi.OpenAiClient.GetChatCompletionsStream(String text, Language language, OpenApiUseCaseOption options, CancellationToken cancellationToken, String callerMemberName)+MoveNext() in /src/Sofia.Common.DigitalAssistantModule/Clients/OpenAi/OpenAiClient.cs:line 125\n at Sofia.Common.DigitalAssistantModule.Clients.OpenAi.OpenAiClient.GetChatCompletionsStream(String text, Language language, OpenApiUseCaseOption options, CancellationToken cancellationToken, String callerMemberName)
We investigated deeper and the truth is that at first there is a 429 (due to rate limits), then I assume the client retries, which results in a 401, and then this exception.
It leads to very wrong investigations, as you think you have a 401, when it's actually a rate limiting issue. Now the big question would be why upon retrying it becomes unauthorized.
Expected behavior
It should fail with a message indicating that the rate limits are exceeded (the 429). Then surrounding code might decide to retry with delay.
Actual behavior
It fails with a 401 unauthorized, which is misleading and cannot be properly handled by surrounding code. Typically you wouldn't retry on a 401.
Reproduction Steps
Have a rate limit of 1000 tokens per minute. Then have a fairly big prompt that eats most of it, and call the streaming chat function. If you call it a second time, within the same minute, then the issue appears.
We reproduce this issue both with WorkloadIdentityCredential when running from Kubernetes, and when running locally with AzureCliCredential. So it doesn't seem related to any token credential issue.
Environment
No response