sglang
sglang copied to clipboard
[Bug] llava-v1.6-34b can not enable Tensor Parallelism, server can not start
Did you get an error message? 'service isn't ready!' usually just means that it's not yet done loading the model weights which can take a long time. For me loading llava-v1.5-13b takes over 7 minutes, much longer than the 40 seconds you waited. After it's done loading you should see something like this:
INFO 03-27 19:02:43 weight_utils.py:163] Using model weights format ['*.bin']
INFO 03-27 19:02:43 weight_utils.py:163] Using model weights format ['*.bin']
INFO 03-27 19:10:01 weight_utils.py:163] Using model weights format ['*.bin']
INFO 03-27 19:10:01 weight_utils.py:163] Using model weights format ['*.bin']
Rank 0: load weight end.
Rank 1: load weight end.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Rank 1: max_total_num_token=52656, max_prefill_num_token=8776, context_len=4096, model_mode=[]
Rank 0: max_total_num_token=52656, max_prefill_num_token=8776, context_len=4096, model_mode=[]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO: Started server process [2211516]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:30813 (Press CTRL+C to quit)
INFO: 127.0.0.1:50816 - "GET /get_model_info HTTP/1.1" 200 OK
new fill batch. #seq: 1. #cached_token: 0. #new_token: 9. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 0.00%.
INFO: 127.0.0.1:50824 - "POST /generate HTTP/1.1" 200 OK
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.