FastChat
FastChat copied to clipboard
Add stream api,send post request call the stream api synchronously
Add stream api(api_stream.py),using dict cache stream response from worker,client send multiple requests get result from cache. start stream api server :
# kill process batch
pkill -9 -f fastchat
# start controller
nohup python -u -m fastchat.serve.controller >> fastchat.log 2>&1 &
# start worker
CUDA_VISIBLE_DEVICES=0 nohup python -u -m fastchat.serve.model_worker --model-name 'vicuna-7b-v1.1' --model-path vicuna_data/vicuna-7b-v1.1 >> fastchat.log 2>&1 &
# strat api server
FASTCHAT_CONTROLLER_URL=http://localhost:21001 CUDA_VISIBLE_DEVICES=0 nohup python -u -m fastchat.serve.api_stream --host 0.0.0.0 --port 8000 >> fastchat.log 2>&1 &
# tail log
tail -f fastchat.log
test:
curl http://localhost:8000/v1/chat/completions/stream \
-H "Content-Type: application/json" \
-d '{"model": "vicuna-7b-v1.1","messages": [{"role": "user", "content": "Hello!"}]}'
Run curl at regular intervals until the returned result contains the stopword ([stop])
https://user-images.githubusercontent.com/7981353/236590175-63c76d67-da09-448e-982c-94cf4ac2a721.mp4
https://user-images.githubusercontent.com/7981353/236590182-34c92beb-d89b-43a7-9d4b-a5da4a314e6c.mp4