FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

FastChat with API using only one processor core on CPU for output generation

Open linus-ahlemeyer opened this issue 2 years ago • 3 comments

I am running this on CPU, and I see that if I provide a lot of input the time it processes this with all available CPU cores takes longer than if I provide small input. But after that is read in only one processor core actually does the generation when using the OpenAI local drop-in API while it's using all available processor cores all the time on the CLI. Is there a parameter that controls this, or is this a bug?

linus-ahlemeyer avatar Jun 10 '23 12:06 linus-ahlemeyer

I think CLI and API (model worker) use the same underlying code. If your observation is true, there might be some bugs (e.g, environmental variable settings). Could you help us fix it?

merrymercy avatar Jun 11 '23 09:06 merrymercy

My python is not very good yet, but sure, let me know how I can help.

linus-ahlemeyer avatar Jun 11 '23 11:06 linus-ahlemeyer

CLI is using only 1 cpu core too.

colin4k avatar Jun 16 '23 08:06 colin4k