azure-search-openai-demo icon indicating copy to clipboard operation
azure-search-openai-demo copied to clipboard

Very slow generation of responses in the App: OpenAI in one region and App service and AI search in another region

Open hicham-aigp opened this issue 1 year ago • 8 comments

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ x ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Increase max tokens to 300 for keyword search and 4096 for response_token_limit

The OpenAI service is deployed in Canada-East and all other service are in East-US

chat_completion = await openai.ChatCompletion.acreate(
            **chatgpt_args,
            model=self.chatgpt_model,
            messages=messages,
            temperature=0.0,
            max_tokens=300,  # Setting too low risks malformed JSON, setting too high may affect performance
            n=1,
            functions=functions,
            function_call="auto",
        )
response_token_limit = 4096

Any log messages given by the failure

No error, the app very slow takes long to generate the response

image

Expected/desired behavior

Fast response generation

GPT Version

gpt-4-32k

Mention any other details that might be useful

The App is too slow and takes too long to generate a response, even for simple greeting. Anyone has an idea on how to solve this or what may have caused this?

I'm using App Service Plan : B2:2


Thanks!

hicham-aigp avatar Dec 13 '23 16:12 hicham-aigp

@hicham-aigp Thanks for sharing your experience. A few questions:

  1. Have you enabled Application Insights and checked the traces to confirm that the performance issue is definitely with the ChatCompletion call, versus the other steps of generating an answer?

  2. Have you tried OpenAI in different regions?

  3. Have you tried a model other than gpt-4-32K?

  4. Is it also slow in the Azure OpenAI studio?

I have heard that there can be varying performance characteristics across regions and models. I've even heard one case where someone improved their performance by making a new deployment of same region/model. I don't know that there's anything we can do in this app code, but I can pass on feedback if we have more details about the slow performance.

pamelafox avatar Dec 13 '23 17:12 pamelafox

Hi @pamelafox, Thanks for the answer.

  1. The Application Insight is not enabled but i check the logs of the service app (no errors)
  2. I wanted to use gpt-4 in EAST-US but it wasn't available on my subscription
  3. I was previously working well but wanted to update and leverage the gpt-4 capabilities, especially in terms of token limit
  4. In the OpenAI Studio also, it tends to be a bit slow but not as on the App

Not sure what could be a solution for this! if you have any other suggestions of recommendations.

Thanks a lot!

hicham-aigp avatar Dec 14 '23 16:12 hicham-aigp

@pamelafox I have replaced the gpt-4-32k by gpt-35-16k on another region (same region for the app service and search services) and the issue is gone, however the quality of response decreases.

Any updates from the team on how to solve this for gpt-4?

Thank you.

hicham-aigp avatar Dec 19 '23 18:12 hicham-aigp

Hi @pamelafox, Just looping in to check if any updates on this issue. The response time is still significantly high. In addition, the generation process tends to stop and resume, leading to a less fluid interaction experience. The pausing can last for few seconds before the response generation resumes, affecting the overall usability.

Overall, the GPT-4 is still very slow, if any updates to share. Any insights or fix would be greatly appreciated! Thank YOU!

hicham-aigp avatar Feb 08 '24 16:02 hicham-aigp

I've asked around in regards to GPT-4, and have found that other developers have seen similar latency. I assume you're using Pay-as-you-go pricing tier, which doesn't come with any latency guarantees. For latency assurance, Azure recommends PTUs: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput Those can be expensive, however, since you're pre-reserving a bunch of capacity.

The other approach I've heard is to use openai.com OpenAI instead of Azure OpenAI. That may be slightly faster due to the lack of the content safety filter service and other protections (but then you lose those protections).

You could also try some prompt engineering or few-shot prompting to improve the quality of the responses for gpt-3.5, but I'm guessing you've tried that already.

And have you tried gpt-4 in every possible region where it's available to you?

pamelafox avatar Feb 09 '24 20:02 pamelafox

Thanks @pamelafox for the detailed response and help! I'm on the Azure Sponsorship tenant and tried to use the PTUs on CanadaEast where my gpt-4-32k is, but couldn't find it! I checked on East-US 2 and it was there but couldn't create the model because of the quota (requested more and still waiting!)

As for the suggested approach, We would like to keep using the Azure OpenAI, although the filter is inconsistent sometimes and wish to disable it (another story!)

For the other suggestion of gpt-3.5, I did try it before and retried it but the quality of responses is way inferior than the GPT-4 responses in my case with the same parameters and system prompt.

For the other regions, doesn't the network cause additional latency as well?

hicham-aigp avatar Feb 12 '24 18:02 hicham-aigp

Okay, so it sounds like you're awaiting quota to use PTUs on eastus2? For the filter, you can request disabling it via a form linked from the Portal. Let them know what inconsistency you see so they can improve the service.

Re other regions: yes, a farther away region could add to latency, but it may be worth verifying that it's equally or more slow in those regions. I think swedencentral has more capacity these days, may be worth checking.

pamelafox avatar Feb 12 '24 20:02 pamelafox

Thank you @pamelafox, I will do further checks on this

hicham-aigp avatar Feb 13 '24 19:02 hicham-aigp