FastChat Different worker with different models don't update the web interface

It seems like when I have different workers with different models, I still only see one of them.

Like here, I have a worker on port 21002 and one worker at port 31001. Both are on the same machine as the controller and the web server. The machine has 4 gpus.

2023-06-16 10:11:45 | INFO | stdout | INFO:     127.0.0.1:36404 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:11:56 | INFO | controller | Receive heart beat. http://localhost:21002
2023-06-16 10:11:56 | INFO | stdout | INFO:     134.94.1.45:60698 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:12:23 | INFO | controller | Receive unknown heart beat. http://localhost:31001
2023-06-16 10:12:23 | INFO | stdout | INFO:     127.0.0.1:47560 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:12:23 | INFO | controller | Register a new worker: http://localhost:31001
2023-06-16 10:12:23 | INFO | controller | Register done: http://localhost:31001, {'model_names': ['vicuna-13b'], 'speed': 1, 'queue_length': 0}
2023-06-16 10:12:23 | INFO | stdout | INFO:     127.0.0.1:47566 - "POST /register_worker HTTP/1.1" 200 OK
2023-06-16 10:12:30 | INFO | controller | Receive heart beat. http://localhost:21002
2023-06-16 10:12:30 | INFO | stdout | INFO:     127.0.0.1:47582 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:12:41 | INFO | controller | Receive heart beat. http://localhost:21002
2023-06-16 10:12:41 | INFO | stdout | INFO:     134.94.1.45:60700 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:13:08 | INFO | controller | Receive heart beat. http://localhost:31001
2023-06-16 10:13:08 | INFO | stdout | INFO:     127.0.0.1:60632 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:13:15 | INFO | controller | Receive heart beat. http://localhost:21002
2023-06-16 10:13:15 | INFO | stdout | INFO:     127.0.0.1:38814 - "POST /receive_heart_beat HTTP/1.1" 200 OK

This looks totally fine: both workers are registered and doing their heartbeat.

This is the worker with t5:

CUDA_VISIBLE_DEVICES=3 python3 -m fastchat.serve.model_worker --model-path lmsys/fastchat-t5-3b-v1.0
2023-06-16 10:08:37 | INFO | model_worker | args: Namespace(host='localhost', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='lmsys/fastchat-t5-3b-v1.0', device='cuda', gpus=None, num_gpus=1, max_gpu_memory=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, model_names=None, limit_model_concurrency=5, stream_interval=2, no_register=False)
2023-06-16 10:08:37 | INFO | model_worker | Loading the model ['fastchat-t5-3b-v1.0'] on worker 0be21e ...
2023-06-16 10:08:45 | INFO | model_worker | Register to controller
2023-06-16 10:08:45 | ERROR | stderr | INFO:     Started server process [1145659]
2023-06-16 10:08:45 | ERROR | stderr | INFO:     Waiting for application startup.
2023-06-16 10:08:45 | ERROR | stderr | INFO:     Application startup complete.
2023-06-16 10:08:45 | ERROR | stderr | INFO:     Uvicorn running on http://localhost:21002 (Press CTRL+C to quit)
2023-06-16 10:09:06 | INFO | stdout | INFO:     127.0.0.1:39262 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-06-16 10:09:30 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:10:15 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e.

And this is the worker with vicuna-13b:

CUDA_VISIBLE_DEVICES="0,1" python3 -m fastchat.serve.model_worker --model-path ../text-generation-webui/models/vicuna-13b/ --port 310001 --worker http://localhost:31001 --num-gpus 2
2023-06-16 10:07:37 | INFO | model_worker | args: Namespace(host='localhost', port=310001, worker_address='http://localhost:31001', controller_address='http://localhost:21001', model_path='../text-generation-webui/models/vicuna-13b/', device='cuda', gpus=None, num_gpus=2, max_gpu_memory=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, model_names=None, limit_model_concurrency=5, stream_interval=2, no_register=False)
2023-06-16 10:07:37 | INFO | model_worker | Loading the model ['vicuna-13b'] on worker 72bd9b ...
2023-06-16 10:07:37 | WARNING | accelerate.utils.modeling | The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards:   0%|                                                                                                         | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:  33%|████████████████████████████████▎                                                                | 1/3 [00:05<00:11,  5.82s/it]
Loading checkpoint shards:  67%|████████████████████████████████████████████████████████████████▋                                | 2/3 [00:11<00:05,  5.60s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  4.81s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  5.05s/it]
2023-06-16 10:07:52 | ERROR | stderr | 
Using pad_token, but it is not set yet.
2023-06-16 10:07:52 | INFO | model_worker | Register to controller
2023-06-16 10:07:53 | ERROR | stderr | INFO:     Started server process [1145606]
2023-06-16 10:07:53 | ERROR | stderr | INFO:     Waiting for application startup.
2023-06-16 10:07:53 | ERROR | stderr | INFO:     Application startup complete.
2023-06-16 10:07:53 | ERROR | stderr | INFO:     Uvicorn running on http://localhost:310001 (Press CTRL+C to quit)
2023-06-16 10:08:37 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b']. Semaphore: None. global_counter: 0. worker_id: 72bd9b. 
2023-06-16 10:09:23 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b']. Semaphore: None. global_counter: 0. worker_id: 72bd9b. 
2023-06-16 10:09:23 | INFO | model_worker | Register to controller

However, I am never given the option to choose vicuna on the model list:

Weirdly, after killing the process of the t5 model, I still get logs of heartbeats from it:

So, the 21002 is dead for some minutes already but this comes on the log:

2023-06-16 10:20:38 | INFO | controller | Receive heart beat. http://localhost:31001
2023-06-16 10:20:38 | INFO | stdout | INFO:     127.0.0.1:41120 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-06-16 10:20:56 | INFO | controller | Receive heart beat. http://localhost:21002
2023-06-16 10:20:56 | INFO | stdout | INFO:     134.94.1.45:60724 - "POST /receive_heart_beat HTTP/1.1" 200 OK

And this is the worker's output:

2023-06-16 10:08:45 | ERROR | stderr | INFO:     Uvicorn running on http://localhost:21002 (Press CTRL+C to quit)
2023-06-16 10:09:06 | INFO | stdout | INFO:     127.0.0.1:39262 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-06-16 10:09:30 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:10:15 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:11:00 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:11:38 | INFO | stdout | INFO:     127.0.0.1:56930 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-06-16 10:11:45 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:12:30 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:13:15 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:14:00 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:14:45 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:15:31 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:16:16 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
2023-06-16 10:17:01 | INFO | model_worker | Send heart beat. Models: ['fastchat-t5-3b-v1.0']. Semaphore: None. global_counter: 0. worker_id: 0be21e. 
^C2023-06-16 10:17:13 | ERROR | stderr | INFO:     Shutting down
2023-06-16 10:17:13 | ERROR | stderr | INFO:     Waiting for application shutdown.
2023-06-16 10:17:13 | ERROR | stderr | INFO:     Application shutdown complete.
2023-06-16 10:17:13 | ERROR | stderr | INFO:     Finished server process [1145659]
^C2023-06-16 10:17:15 | ERROR | stderr | Exception ignored in: <module 'threading' from '/easybuild/2020/software/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/threading.py'>
2023-06-16 10:17:15 | ERROR | stderr | Traceback (most recent call last):
2023-06-16 10:17:15 | ERROR | stderr |   File "/easybuild/2020/software/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/threading.py", line 1560, in _shutdown
2023-06-16 10:17:15 | ERROR | stderr |     lock.acquire()
2023-06-16 10:17:15 | ERROR | stderr | KeyboardInterrupt:

If I reset the web server after this (by forcing closing and reopening it), I get no model available, even though vicuna is still there sending heartbeats and showing as registered.

Jun 16 '23 10:06 surak

So, this works on the fastchat.serve.gradio_web_server_multi (provided you restart the server), but it does not on the fastchat.serve.gradio_web_server - which makes the model selection tab on the web_server moot.

Jun 16 '23 10:06 surak

Ok, this now works: --model-list-mode=reload

Oct 02 '23 20:10 surak

FastChat FastChat copied to clipboard

Different worker with different models don't update the web interface

FastChat
FastChat copied to clipboard