sglang Don't get API response when sending images

I loaded Llava v1.6 34B on my server

export DISABLE_NEST_ASYNCIO=True
model=liuhaotian/llava-v1.6-34b 
tokenizer=liuhaotian/llava-v1.6-34b-tokenizer 

CUDA_VISIBLE_DEVICES=0,1 python3 -m sglang.launch_server --model-path $model --tokenizer-path $tokenizer --port 30813 --tp 2

It works when I work with just with text, but when I send images I just don't get a response, this is what the server log shows:

$ ./start_sglang_server.sh 
/home/tom/.local/lib/python3.10/site-packages/transformers/models/llava/configuration_llava.py:104: FutureWarning: The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`. Passing this argument has no effect
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
server started on [0.0.0.0]:10007
server started on [0.0.0.0]:10008
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
accepted ('127.0.0.1', 35124) with fd 35
welcome ('127.0.0.1', 35124)
accepted ('127.0.0.1', 58368) with fd 31
welcome ('127.0.0.1', 58368)
/home/tom/.local/lib/python3.10/site-packages/transformers/models/llava/configuration_llava.py:144: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.42, Please use `text_config.vocab_size` instead.
  warnings.warn(
/home/tom/.local/lib/python3.10/site-packages/transformers/models/llava/configuration_llava.py:144: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.42, Please use `text_config.vocab_size` instead.
  warnings.warn(
Rank 1: load weight begin.
/home/tom/.local/lib/python3.10/site-packages/transformers/models/llava/configuration_llava.py:144: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.42, Please use `text_config.vocab_size` instead.
  warnings.warn(
Rank 0: load weight begin.
/home/tom/.local/lib/python3.10/site-packages/transformers/models/llava/configuration_llava.py:144: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.42, Please use `text_config.vocab_size` instead.
  warnings.warn(
/home/tom/.local/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/tom/.local/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
INFO 04-09 11:54:13 weight_utils.py:163] Using model weights format ['*.safetensors']
INFO 04-09 11:54:13 weight_utils.py:163] Using model weights format ['*.safetensors']
INFO 04-09 11:54:14 weight_utils.py:163] Using model weights format ['*.safetensors']
INFO 04-09 11:54:14 weight_utils.py:163] Using model weights format ['*.safetensors']
Rank 0: load weight end.
Rank 1: load weight end.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Rank 1: max_total_num_token=1378, max_prefill_num_token=4096, context_len=4096, 
disable_radix_cache=False, enable_flashinfer=False, disable_regex_jump_forward=False, disable_disk_cache=False, attention_reduce_in_fp32=False
Rank 0: max_total_num_token=1378, max_prefill_num_token=4096, context_len=4096, 
disable_radix_cache=False, enable_flashinfer=False, disable_regex_jump_forward=False, disable_disk_cache=False, attention_reduce_in_fp32=False
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO:     Started server process [100699]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:30813 (Press CTRL+C to quit)
INFO:     127.0.0.1:48684 - "GET /get_model_info HTTP/1.1" 200 OK
new fill batch. #seq: 1. #cached_token: 0. #new_token: 8. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 0.00%.
INFO:     127.0.0.1:48700 - "POST /generate HTTP/1.1" 200 OK
INFO:     127.0.0.1:44048 - "GET /get_model_info HTTP/1.1" 200 OK
INFO:     127.0.0.1:43484 - "GET /get_model_info HTTP/1.1" 200 OK
new fill batch. #seq: 1. #cached_token: 0. #new_token: 1157. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 0.00%.
INFO:     127.0.0.1:43490 - "POST /generate HTTP/1.1" 200 OK

The same script and server config used to work on my old system, but I can't access it anymore. I suspect a it's just an incompatible package. Here are my package versions:

aiohttp==3.9.3
aiosignal==1.3.1
altair==5.3.0
annotated-types==0.6.0
anthropic==0.23.1
anyio==4.3.0
async-timeout==4.0.3
attrs==23.2.0
Automat==20.2.0
Babel==2.8.0
bcrypt==3.2.0
blinker==1.4
cachetools==5.3.3
certifi==2020.6.20
chardet==4.0.0
charset-normalizer==3.3.2
click==8.0.3
cloud-init==23.4.4
cloudpickle==3.0.0
cmake==3.29.0.1
colorama==0.4.4
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
cryptography==3.4.8
cupy-cuda12x==12.1.0
dbus-python==1.2.18
diskcache==5.6.3
distro==1.7.0
distro-info==1.1+ubuntu0.2
exceptiongroup==1.2.0
fastapi==0.110.1
fastrlock==0.8.2
filelock==3.13.3
frozenlist==1.4.1
fsspec==2024.3.1
gitdb==4.0.11
GitPython==3.1.43
h11==0.14.0
httpcore==1.0.5
httplib2==0.20.2
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.22.2
hyperlink==21.0.0
idna==3.3
importlib-metadata==4.6.4
incremental==21.3.0
interegular==0.3.3
jeepney==0.7.1
Jinja2==3.0.3
joblib==1.4.0
jsonpatch==1.32
jsonpointer==2.0
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
keyring==23.5.0
lark==1.1.9
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
llvmlite==0.42.0
markdown-it-py==3.0.0
MarkupSafe==2.0.1
mdurl==0.1.2
more-itertools==8.10.0
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
nest-asyncio==1.6.0
netifaces==0.11.0
networkx==3.3
ninja==1.11.1.1
numba==0.59.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.535.133
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105
nvitop==1.3.2
oauthlib==3.2.0
openai==1.16.2
outlines==0.0.34
packaging==24.0
pandas==2.2.1
pexpect==4.8.0
pillow==10.3.0
plumbum==1.8.2
prometheus_client==0.20.0
protobuf==4.25.3
psutil==5.9.8
ptyprocess==0.7.0
py-cpuinfo==9.0.0
pyarrow==15.0.2
pyasn1==0.4.8
pyasn1-modules==0.2.1
pydantic==2.6.4
pydantic_core==2.16.3
pydeck==0.8.1b0
Pygments==2.17.2
PyGObject==3.42.1
PyHamcrest==2.0.2
PyJWT==2.3.0
pynvml==11.5.0
pyOpenSSL==21.0.0
pyparsing==2.4.7
pyrsistent==0.18.1
pyserial==3.5
python-apt==2.4.0+ubuntu3
python-dateutil==2.9.0.post0
python-debian==0.1.43+ubuntu1.1
python-dotenv==1.0.1
python-magic==0.4.24
pytz==2022.1
PyYAML==5.4.1
pyzmq==25.1.2
ray==2.10.0
referencing==0.34.0
regex==2023.12.25
requests==2.31.0
rich==13.7.1
rpds-py==0.18.0
rpyc==6.0.0
safetensors==0.4.2
scipy==1.13.0
screen-resolution-extra==0.0.0
SecretStorage==3.3.1
sentencepiece==0.2.0
service-identity==18.1.0
-e git+https://github.com/sgl-project/sglang.git@550a4f78f382b5a7f4008d7d21e876e71ab2d2b6#egg=sglang&subdirectory=python
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
sos==4.5.6
ssh-import-id==5.11
starlette==0.37.2
streamlit==1.33.0
sympy==1.12
systemd-python==234
tenacity==8.2.3
termcolor==2.4.0
tiktoken==0.6.0
tokenizers==0.15.2
toml==0.10.2
toolz==0.12.1
torch==2.1.2
tornado==6.4
tqdm==4.66.2
transformers==4.39.3
triton==2.1.0
Twisted==22.1.0
typing_extensions==4.11.0
tzdata==2024.1
ubuntu-drivers-common==0.0.0
ubuntu-pro-client==8001
ufw==0.36.1
unattended-upgrades==0.1
urllib3==1.26.5
uvicorn==0.29.0
uvloop==0.19.0
vllm==0.3.3
wadllib==1.3.6
watchdog==4.0.0
watchfiles==0.21.0
websockets==12.0
xformers==0.0.23.post1
xkit==0.0.0
yarl==1.9.4
zipp==1.0.0
zmq==0.0.0
zope.interface==5.4.0

Apr 09 '24 13:04 tom-doerr

Solved it by setting mem-fraction-static to 0.9. Full command that works for me:

python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.6-34b --tokenizer-path liuhaotian/llava-v1.6-34b-tokenizer --port 30813 --tp 2 --mem-fraction-static '0.9'

Maybe a warning/error could be added that there is an issue and what the issue is.

Apr 29 '24 00:04 tom-doerr

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

Jul 25 '24 06:07 github-actions[bot]

sglang sglang copied to clipboard

Don't get API response when sending images

sglang
sglang copied to clipboard