sglang [Bug] llava-v1.6-34b can not enable Tensor Parallelism, server can not start

[Bug] llava-v1.6-34b can not enable Tensor Parallelism, server can not start

Open lss15151161 opened this issue 1 year ago • 1 comments

img_v3_0295_6d944962-fe76-4aa6-abef-c96a8d5379cg

Mar 25 '24 02:03 lss15151161

Did you get an error message? 'service isn't ready!' usually just means that it's not yet done loading the model weights which can take a long time. For me loading llava-v1.5-13b takes over 7 minutes, much longer than the 40 seconds you waited. After it's done loading you should see something like this:

INFO 03-27 19:02:43 weight_utils.py:163] Using model weights format ['*.bin']
INFO 03-27 19:02:43 weight_utils.py:163] Using model weights format ['*.bin']
INFO 03-27 19:10:01 weight_utils.py:163] Using model weights format ['*.bin']
INFO 03-27 19:10:01 weight_utils.py:163] Using model weights format ['*.bin']
Rank 0: load weight end.
Rank 1: load weight end.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Rank 1: max_total_num_token=52656, max_prefill_num_token=8776, context_len=4096, model_mode=[]
Rank 0: max_total_num_token=52656, max_prefill_num_token=8776, context_len=4096, model_mode=[]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO:     Started server process [2211516]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:30813 (Press CTRL+C to quit)
INFO:     127.0.0.1:50816 - "GET /get_model_info HTTP/1.1" 200 OK
new fill batch. #seq: 1. #cached_token: 0. #new_token: 9. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 0.00%.
INFO:     127.0.0.1:50824 - "POST /generate HTTP/1.1" 200 OK

Mar 27 '24 19:03 tom-doerr

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

Jul 25 '24 06:07 github-actions[bot]

sglang sglang copied to clipboard

[Bug] llava-v1.6-34b can not enable Tensor Parallelism, server can not start

sglang
sglang copied to clipboard