semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

Sample app response is slow like 10-20s sometimes

Open poweihuang0817 opened this issue 2 years ago • 2 comments

Describe the bug Sample app response is slow like 10-20s sometimes. Even though I've used paid account with openAI, probably too many round trip.

  Request finished HTTP/2 POST https://localhost:40443/skills/ChatSkill/functions/Chat/invoke application/json 1132 - 200 - application/json;+charset=utf-8 103352.8697ms

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. Windows]
  • IDE: [e.g. Visual Studio, VS Code]
  • NuGet Package Version [e.g. 0.1.0]

Additional context Add any other context about the problem here.

poweihuang0817 avatar May 15 '23 18:05 poweihuang0817

It looks like you're using the Copilot Chat sample - for the moment, to speed things up you can edit appsettings.json and set Planner:Enabled to false. While will disable plugins and some other interesting features, you will get a faster response.

We are continuing to explore strategies to improve the response and will likely add chat streaming in to copilot chat in the near future (#829 added streaming support to SK)

craigomatic avatar May 16 '23 20:05 craigomatic

But with planner disabled, could we still invoke custom skill? Would we be able to select one skill from many? Looks like no?

poweihuang0817 avatar May 17 '23 16:05 poweihuang0817

We recently removed the planner:enabled setting and it now automatically disables if the user on the frontend has not enabled any plugins. https://github.com/microsoft/semantic-kernel/pull/1151

adrianwyatt avatar May 24 '23 18:05 adrianwyatt

The lag in chat responses has to do with the chat app running 1-2 round trips with the AI (3 if a plugin/planner is enabled). We also use the AI /embeddings endpoints to create long-term memories which adds at least one more round trip. After the /Build conference we are taking a look at how we might be able to bring this response time down through parallelization and a different memories architecture.

adrianwyatt avatar May 24 '23 18:05 adrianwyatt