FastChat
FastChat copied to clipboard
Different worker with different models don't update the web interface
It seems like when I have different workers with different models, I still only see one of them.
Like here, I have a worker on port 21002 and one worker at port 31001. Both are on the same machine as the controller and the web server. The machine has 4 gpus.
2023-06-16 10:11:45 | INFO | stdout | INFO: 127.0.0.1:36404 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:11:56 | INFO | controller | Receive heart beat. http://localhost:21002
2023-06-16 10:11:56 | INFO | stdout | INFO: 134.94.1.45:60698 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:12:23 | INFO | controller | Receive unknown heart beat. http://localhost:31001
2023-06-16 10:12:23 | INFO | stdout | INFO: 127.0.0.1:47560 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:12:23 | INFO | controller | Register a new worker: http://localhost:31001
2023-06-16 10:12:23 | INFO | controller | Register done: http://localhost:31001, {'model_names': ['vicuna-13b'], 'speed': 1, 'queue_length': 0}
2023-06-16 10:12:23 | INFO | stdout | INFO: 127.0.0.1:47566 - "POST /register_worker HTTP/1.1" 200 OK
2023-06-16 10:12:30 | INFO | controller | Receive heart beat. http://localhost:21002
2023-06-16 10:12:30 | INFO | stdout | INFO: 127.0.0.1:47582 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:12:41 | INFO | controller | Receive heart beat. http://localhost:21002
2023-06-16 10:12:41 | INFO | stdout | INFO: 134.94.1.45:60700 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:13:08 | INFO | controller | Receive heart beat. http://localhost:31001
2023-06-16 10:13:08 | INFO | stdout | INFO: 127.0.0.1:60632 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:13:15 | INFO | controller | Receive heart beat. http://localhost:21002
2023-06-16 10:13:15 | INFO | stdout | INFO: 127.0.0.1:38814 - "POST /receive_heart_beat HTTP/1.1" 200 OK
This looks totally fine: both workers are registered and doing their heartbeat.
This is the worker with t5:
CUDA_VISIBLE_DEVICES=3 python3 -m fastchat.serve.model_worker --model-path lmsys/fastchat-t5-3b-v1.0
2023-06-16 10:08:37 | INFO | model_worker | args: Namespace(host='localhost', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='lmsys/fastchat-t5-3b-v1.0', device='cuda', gpus=None, num_gpus=1, max_gpu_memory=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, model_names=None, limit_model_concurrency=5, stream_interval=2, no_register=False)
2023-06-16 10:08:37 | INFO | model_worker | Loading the model ['fastchat-t5-3b-v1.0'] on worker 0be21e ...
2023-06-16 10:08:45 | INFO | model_worker | Register to controller
2023-06-16 10:08:45 | ERROR | stderr | INFO: Started server process [1145659]
2023-06-16 10:08:45 | ERROR | stderr | INFO: Waiting for application startup.
2023-06-16 10:08:45 | ERROR | stderr | INFO: Application startup complete.
2023-06-16 10:08:45 | ERROR | stderr | INFO: Uvicorn running on http://localhost:21002 (Press CTRL+C to quit)
2023-06-16 10:09:06 | INFO | stdout | INFO: 127.0.0.1:39262 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-06-16 10:09:30 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:10:15 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
And this is the worker with vicuna-13b:
CUDA_VISIBLE_DEVICES="0,1" python3 -m fastchat.serve.model_worker --model-path ../text-generation-webui/models/vicuna-13b/ --port 310001 --worker http://localhost:31001 --num-gpus 2
2023-06-16 10:07:37 | INFO | model_worker | args: Namespace(host='localhost', port=310001, worker_address='http://localhost:31001', controller_address='http://localhost:21001', model_path='../text-generation-webui/models/vicuna-13b/', device='cuda', gpus=None, num_gpus=2, max_gpu_memory=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, model_names=None, limit_model_concurrency=5, stream_interval=2, no_register=False)
2023-06-16 10:07:37 | INFO | model_worker | Loading the model ['vicuna-13b'] on worker 72bd9b ...
2023-06-16 10:07:37 | WARNING | accelerate.utils.modeling | The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|████████████████████████████████▎ | 1/3 [00:05<00:11, 5.82s/it]
Loading checkpoint shards: 67%|████████████████████████████████████████████████████████████████▋ | 2/3 [00:11<00:05, 5.60s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00, 4.81s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00, 5.05s/it]
2023-06-16 10:07:52 | ERROR | stderr |
Using pad_token, but it is not set yet.
2023-06-16 10:07:52 | INFO | model_worker | Register to controller
2023-06-16 10:07:53 | ERROR | stderr | INFO: Started server process [1145606]
2023-06-16 10:07:53 | ERROR | stderr | INFO: Waiting for application startup.
2023-06-16 10:07:53 | ERROR | stderr | INFO: Application startup complete.
2023-06-16 10:07:53 | ERROR | stderr | INFO: Uvicorn running on http://localhost:310001 (Press CTRL+C to quit)
2023-06-16 10:08:37 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b']. Semaphore: None. global_counter: 0. worker_id: 72bd9b.
2023-06-16 10:09:23 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b']. Semaphore: None. global_counter: 0. worker_id: 72bd9b.
2023-06-16 10:09:23 | INFO | model_worker | Register to controller
However, I am never given the option to choose vicuna on the model list:
Weirdly, after killing the process of the t5 model, I still get logs of heartbeats from it:
So, the 21002 is dead for some minutes already but this comes on the log:
2023-06-16 10:20:38 | INFO | controller | Receive heart beat. http://localhost:31001
2023-06-16 10:20:38 | INFO | stdout | INFO: 127.0.0.1:41120 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:20:56 | INFO | controller | Receive heart beat. http://localhost:21002
2023-06-16 10:20:56 | INFO | stdout | INFO: 134.94.1.45:60724 - "POST /receive_heart_beat HTTP/1.1" 200 OK
And this is the worker's output:
2023-06-16 10:08:45 | ERROR | stderr | INFO: Uvicorn running on http://localhost:21002 (Press CTRL+C to quit)
2023-06-16 10:09:06 | INFO | stdout | INFO: 127.0.0.1:39262 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-06-16 10:09:30 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:10:15 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:11:00 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:11:38 | INFO | stdout | INFO: 127.0.0.1:56930 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-06-16 10:11:45 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:12:30 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:13:15 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:14:00 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:14:45 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:15:31 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:16:16 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
2023-06-16 10:17:01 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.
^C2023-06-16 10:17:13 | ERROR | stderr | INFO: Shutting down
2023-06-16 10:17:13 | ERROR | stderr | INFO: Waiting for application shutdown.
2023-06-16 10:17:13 | ERROR | stderr | INFO: Application shutdown complete.
2023-06-16 10:17:13 | ERROR | stderr | INFO: Finished server process [1145659]
^C2023-06-16 10:17:15 | ERROR | stderr | Exception ignored in: <module 'threading' from '/easybuild/2020/software/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/threading.py'>
2023-06-16 10:17:15 | ERROR | stderr | Traceback (most recent call last):
2023-06-16 10:17:15 | ERROR | stderr | File "/easybuild/2020/software/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/threading.py", line 1560, in _shutdown
2023-06-16 10:17:15 | ERROR | stderr | lock.acquire()
2023-06-16 10:17:15 | ERROR | stderr | KeyboardInterrupt:
If I reset the web server after this (by forcing closing and reopening it), I get no model available, even though vicuna is still there sending heartbeats and showing as registered.
So, this works on the fastchat.serve.gradio_web_server_multi (provided you restart the server), but it does not on the fastchat.serve.gradio_web_server - which makes the model selection tab on the web_server moot.
Ok, this now works: --model-list-mode=reload