TaskWeaver icon indicating copy to clipboard operation
TaskWeaver copied to clipboard

Very slow inference (with local LLM) during agent work

Open Pitachoo11 opened this issue 1 year ago • 1 comments
trafficstars

Describe the bug Very slow inference during agent work in comparison to usual LLM interaction I'm using local setup with API connection to TextGen WebUI in local network Each iteration of TaskWeaver is very-very slow generation speed is drastically decreased to around 1-2 t/s (usual speed on same setup 15-20 t/s)

At this communication rate this tool is net very useful, simple coding task like print numbers executed in 20-30 mins. Is there any tweak to solve it. I guess it could because of relatively large context in each request?

To Reproduce Steps to reproduce the behavior:

  1. Start the service
  2. Type the user query "any listed query from example description"
  3. Wait for the response forever

Expected behavior Similar inference speed as Autogen

Environment Information (please complete the following information):

  • OS: MacOS
  • Python Version 3.11
  • LLM that you're using: number of different 7b models

Pitachoo11 avatar Dec 06 '23 15:12 Pitachoo11

Describe the bug Very slow inference during agent work in comparison to usual LLM interaction I'm using local setup with API connection to TextGen WebUI in local network Each iteration of TaskWeaver is very-very slow generation speed is drastically decreased to around 1-2 t/s (usual speed on same setup 15-20 t/s)

At this communication rate this tool is net very useful, simple coding task like print numbers executed in 20-30 mins. Is there any tweak to solve it. I guess it could because of relatively large context in each request?

To Reproduce Steps to reproduce the behavior:

  1. Start the service
  2. Type the user query "any listed query from example description"
  3. Wait for the response forever

Expected behavior Similar inference speed as Autogen

Environment Information (please complete the following information):

  • OS: MacOS
  • Python Version 3.11
  • LLM that you're using: number of different 7b models

hi bro, how to run with local llm

WillianXu117 avatar Jan 23 '24 06:01 WillianXu117

Close inactive issues.

liqul avatar Feb 04 '24 02:02 liqul