azure-search-openai-demo
azure-search-openai-demo copied to clipboard
Very slow generation of responses in the App: OpenAI in one region and App service and AI search in another region
This issue is for a: (mark with an x)
- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ x ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Increase max tokens to 300 for keyword search and 4096 for response_token_limit
The OpenAI service is deployed in Canada-East and all other service are in East-US
chat_completion = await openai.ChatCompletion.acreate(
**chatgpt_args,
model=self.chatgpt_model,
messages=messages,
temperature=0.0,
max_tokens=300, # Setting too low risks malformed JSON, setting too high may affect performance
n=1,
functions=functions,
function_call="auto",
)
response_token_limit = 4096
Any log messages given by the failure
No error, the app very slow takes long to generate the response
Expected/desired behavior
Fast response generation
GPT Version
gpt-4-32k
Mention any other details that might be useful
The App is too slow and takes too long to generate a response, even for simple greeting. Anyone has an idea on how to solve this or what may have caused this?
I'm using App Service Plan : B2:2
Thanks!
@hicham-aigp Thanks for sharing your experience. A few questions:
-
Have you enabled Application Insights and checked the traces to confirm that the performance issue is definitely with the ChatCompletion call, versus the other steps of generating an answer?
-
Have you tried OpenAI in different regions?
-
Have you tried a model other than gpt-4-32K?
-
Is it also slow in the Azure OpenAI studio?
I have heard that there can be varying performance characteristics across regions and models. I've even heard one case where someone improved their performance by making a new deployment of same region/model. I don't know that there's anything we can do in this app code, but I can pass on feedback if we have more details about the slow performance.
Hi @pamelafox, Thanks for the answer.
- The Application Insight is not enabled but i check the logs of the service app (no errors)
- I wanted to use gpt-4 in EAST-US but it wasn't available on my subscription
- I was previously working well but wanted to update and leverage the gpt-4 capabilities, especially in terms of token limit
- In the OpenAI Studio also, it tends to be a bit slow but not as on the App
Not sure what could be a solution for this! if you have any other suggestions of recommendations.
Thanks a lot!
@pamelafox I have replaced the gpt-4-32k by gpt-35-16k on another region (same region for the app service and search services) and the issue is gone, however the quality of response decreases.
Any updates from the team on how to solve this for gpt-4?
Thank you.
Hi @pamelafox, Just looping in to check if any updates on this issue. The response time is still significantly high. In addition, the generation process tends to stop and resume, leading to a less fluid interaction experience. The pausing can last for few seconds before the response generation resumes, affecting the overall usability.
Overall, the GPT-4 is still very slow, if any updates to share. Any insights or fix would be greatly appreciated! Thank YOU!
I've asked around in regards to GPT-4, and have found that other developers have seen similar latency. I assume you're using Pay-as-you-go pricing tier, which doesn't come with any latency guarantees. For latency assurance, Azure recommends PTUs: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput Those can be expensive, however, since you're pre-reserving a bunch of capacity.
The other approach I've heard is to use openai.com OpenAI instead of Azure OpenAI. That may be slightly faster due to the lack of the content safety filter service and other protections (but then you lose those protections).
You could also try some prompt engineering or few-shot prompting to improve the quality of the responses for gpt-3.5, but I'm guessing you've tried that already.
And have you tried gpt-4 in every possible region where it's available to you?
Thanks @pamelafox for the detailed response and help! I'm on the Azure Sponsorship tenant and tried to use the PTUs on CanadaEast where my gpt-4-32k is, but couldn't find it! I checked on East-US 2 and it was there but couldn't create the model because of the quota (requested more and still waiting!)
As for the suggested approach, We would like to keep using the Azure OpenAI, although the filter is inconsistent sometimes and wish to disable it (another story!)
For the other suggestion of gpt-3.5, I did try it before and retried it but the quality of responses is way inferior than the GPT-4 responses in my case with the same parameters and system prompt.
For the other regions, doesn't the network cause additional latency as well?
Okay, so it sounds like you're awaiting quota to use PTUs on eastus2? For the filter, you can request disabling it via a form linked from the Portal. Let them know what inconsistency you see so they can improve the service.
Re other regions: yes, a farther away region could add to latency, but it may be worth verifying that it's equally or more slow in those regions. I think swedencentral has more capacity these days, may be worth checking.
Thank you @pamelafox, I will do further checks on this