azure-sdk-for-net icon indicating copy to clipboard operation
azure-sdk-for-net copied to clipboard

[BUG] Azure.AI.OpenAI: Exceeding rate limits results in a retry which ends with 401 Unauthorized

Open molinch opened this issue 5 days ago • 2 comments

Library name and version

Azure.AI.OpenAI 2.0.0-beta.2

Describe the bug

We have a rather big prompt, and a small rate limit of 1000 tokens/minute. Due to that combination we can invoke the OpenAPI endpoint only once every minute.

If we exceed that we get back such an exception:

"message": "Service request failed.\nStatus: 401 (Unauthorized)\n",
"stackTrace": "   at Azure.AI.OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options)\n   at Azure.AI.OpenAI.Chat.AzureChatClient.CompleteChatAsync(BinaryContent content, RequestOptions options)\n   at OpenAI.Chat.ChatClient.<>c__DisplayClass8_0.<<CompleteChatStreamingAsync>g__getResultAsync|0>d.MoveNext()\n--- End of stack trace from previous location ---\n   at OpenAI.Chat.AsyncStreamingChatCompletionUpdateCollection.AsyncStreamingChatUpdateEnumerator.CreateEventEnumeratorAsync()\n   at OpenAI.Chat.AsyncStreamingChatCompletionUpdateCollection.AsyncStreamingChatUpdateEnumerator.System.Collections.Generic.IAsyncEnumerator<OpenAI.Chat.StreamingChatCompletionUpdate>.MoveNextAsync()\n   at Sofia.Common.DigitalAssistantModule.Clients.OpenAi.OpenAiClient.GetChatCompletionsStream(String text, Language language, OpenApiUseCaseOption options, CancellationToken cancellationToken, String callerMemberName)+MoveNext() in /src/Sofia.Common.DigitalAssistantModule/Clients/OpenAi/OpenAiClient.cs:line 125\n   at Sofia.Common.DigitalAssistantModule.Clients.OpenAi.OpenAiClient.GetChatCompletionsStream(String text, Language language, OpenApiUseCaseOption options, CancellationToken cancellationToken, String callerMemberName)+MoveNext() in /src/Sofia.Common.DigitalAssistantModule/Clients/OpenAi/OpenAiClient.cs:line 125\n   at Sofia.Common.DigitalAssistantModule.Clients.OpenAi.OpenAiClient.GetChatCompletionsStream(String text, Language language, OpenApiUseCaseOption options, CancellationToken cancellationToken, String callerMemberName)

image

We investigated deeper and the truth is that at first there is a 429 (due to rate limits), then I assume the client retries, which results in a 401, and then this exception.

It leads to very wrong investigations, as you think you have a 401, when it's actually a rate limiting issue. Now the big question would be why upon retrying it becomes unauthorized.

Expected behavior

It should fail with a message indicating that the rate limits are exceeded (the 429). Then surrounding code might decide to retry with delay.

Actual behavior

It fails with a 401 unauthorized, which is misleading and cannot be properly handled by surrounding code. Typically you wouldn't retry on a 401.

Reproduction Steps

Have a rate limit of 1000 tokens per minute. Then have a fairly big prompt that eats most of it, and call the streaming chat function. If you call it a second time, within the same minute, then the issue appears.

We reproduce this issue both with WorkloadIdentityCredential when running from Kubernetes, and when running locally with AzureCliCredential. So it doesn't seem related to any token credential issue.

Environment

No response

molinch avatar Jun 29 '24 07:06 molinch