FastChat FastChat with API using only one processor core on CPU for output generation

FastChat with API using only one processor core on CPU for output generation

Open linus-ahlemeyer opened this issue 2 years ago • 3 comments

I am running this on CPU, and I see that if I provide a lot of input the time it processes this with all available CPU cores takes longer than if I provide small input. But after that is read in only one processor core actually does the generation when using the OpenAI local drop-in API while it's using all available processor cores all the time on the CLI. Is there a parameter that controls this, or is this a bug?

Jun 10 '23 12:06 linus-ahlemeyer

I think CLI and API (model worker) use the same underlying code. If your observation is true, there might be some bugs (e.g, environmental variable settings). Could you help us fix it?

Jun 11 '23 09:06 merrymercy

My python is not very good yet, but sure, let me know how I can help.

Jun 11 '23 11:06 linus-ahlemeyer

CLI is using only 1 cpu core too.

Jun 16 '23 08:06 colin4k

FastChat FastChat copied to clipboard

FastChat with API using only one processor core on CPU for output generation

FastChat
FastChat copied to clipboard