sample-app-aoai-chatGPT
sample-app-aoai-chatGPT copied to clipboard
BUG: Getting quota errors in webApp but not in the playground
Hello! I'm getting this error in webApp everytime upon 2nd message (the 1'st one is fine):
Error
Requests to the Creates a completion for the chat message Operation under Azure OpenAI API version 2023-03-15-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 7 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.
But via Azure Studio OpenAI playground everything is working fine for the same deployment/model -- I can ask 10 question in a row, no issues. Why is that happening? Why "Completion" and not "prompt"?
I've tried with different models, ie 3.5-turbo, gpt-4, new and old, everywhere it is the same. Currently used model is 0301 3.5-turbo in west europe. My token limit is set to 7K.
P.S. I'm actually running the image in the kubernetes cluster not in WebApp, but I guess it shouldn't make any difference in this case.
I've tried playing around with API version, but no luck:
when in app.py I set api to be 2023-06-01-preview instead of originally set 2023-03-15-preview -- then it's even worse, fails with the same error straight upon the first message.
With the older one 2022-12-01, it says -- Resource not found.
I left it now with the stable release "2023-05-15" -- same behavior as with 2023-03-15-preview.
https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new#may-2023
ahh maybe still something specific to my kubernetes deployment... this is what i see in the pod logs:
GET /assets/Send-d0601aaa.svg => generated 0 bytes in 0 msecs (HTTP/1.1 304) 4 headers in 185 bytes (0 switches on core 0)
2023-08-03T15:20:54.437250234Z [pid: 1|app: 0|req: 15/15] 172.22.129.13 () {70 vars in 5804 bytes} [Thu Aug 3 15:20:52 2023] POST /conversation => generated 468 bytes in 1582 msecs (HTTP/1.1 200) 2 headers in 72 bytes (1 switches on core 0)
2023-08-03T15:21:00.899604966Z ERROR:root:Exception in /conversation
2023-08-03T15:21:00.899645167Z Traceback (most recent call last):
2023-08-03T15:21:00.899648867Z File "app.py", line 252, in conversation
2023-08-03T15:21:00.899651467Z return conversation_without_data(request)
2023-08-03T15:21:00.899653967Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-08-03T15:21:00.899656668Z File "app.py", line 214, in conversation_without_data
2023-08-03T15:21:00.899659368Z response = openai.ChatCompletion.create(
2023-08-03T15:21:00.899661868Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-08-03T15:21:00.899665768Z File "/usr/local/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create
2023-08-03T15:21:00.899668468Z return super().create(*args, **kwargs)
2023-08-03T15:21:00.899670968Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-08-03T15:21:00.899674568Z File "/usr/local/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
2023-08-03T15:21:00.899677168Z response, _, api_key = requestor.request(
2023-08-03T15:21:00.899680968Z ^^^^^^^^^^^^^^^^^^
2023-08-03T15:21:00.899685568Z File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 230, in request
2023-08-03T15:21:00.899689069Z resp, got_stream = self._interpret_response(result, stream)
2023-08-03T15:21:00.899692269Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-08-03T15:21:00.899695869Z File "/usr
/local/lib/python3.11/site-packages/openai/api_requestor.py", line 624, in _interpret_response
2023-08-03T15:21:00.899699269Z self._interpret_response_line(
2023-08-03T15:21:00.899702269Z File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 687, in _interpret_response_line
raise self.handle_error_response(
2023-08-03T15:21:00.899708469Z openai.error.RateLimitError: Requests to the Creates a completion for the chat message Operation under Azure OpenAI API version 2023-05-15 have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 53 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.
2023-08-03T15:21:00.899786772Z [pid: 1|app: 0|req: 16/16] 172.22.129.13 () {70 vars in 5805 bytes} [Thu Aug 3 15:21:00 2023] POST /conversation => generated 335 bytes in 14 msecs (HTTP/1.1 500) 2 headers in 91 bytes (1 switches on core 0)
can't guess where did it go wrong....
What TPM do you currently have for your deployment? Each question takes an average 1000 tokens, so it is easy to exceed the rate limits if your deployments have low TPM.
Thanks Pamela, just not sure why I would not get the same result on the OpenAI Studio in this case... I'll try setting the limit higher then.
Having the same issue with a totally different stack FWIW. Using a llama-index and Python stack in a k8s deployment, and I'm getting this message like one in 4 to one in 10 messages
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.