FastChat
FastChat copied to clipboard
fastchat.serve.model_worker --device cpu only uses one CPU Thread for token generation.
Hi,
i launch a worker with python3 -m fastchat.serve.model_worker --model-path /home/llamaweights/vicuna-13b --device cpu and then the webGUI, which works fine so far. When i do a request, after an initial loading time one core goes to 100% while the others idle. If i make a second request in another tap another core goes to 100% while the other 14 idle... Token generation is very slow, but does not get any slower for additional requests. Can i somehow use all 16 threads or at least all 8 cores for a single request to speed up token generation?
Kind regards