OpenLLM bug: OpenLLM query in WSL failed with a timeout

Describe the bug

Hi,

When trying to query the meta-llama/Llama-2-7b-chat-hf with a simple query (i.e. 'Hello'), the query failed with a timeout.

Just to be clear, i launched in a wsl session using the server openllm start ... command, and in another wsl session i tried to queried it using the openllm query ... command.

I tried to add the flag --timeout 300 to both the start command (i.e. openllm start llama --model-id meta... --timeout 300) and the query command (i.e. openllm query --timeout 300), but it doesn't seems to be taken into account as the query will fail after 30 seconds (which i believe is the default settings).

I don't really get why this failed, because the logs are really not helping (even with the --debug flag). I don't know, maybe the model is to big or something.

Do you have any idea why this happens ? Maybe i'm doing something wrong with my wsl session.

To reproduce

No response

Logs

No response

Environment

accelerate 0.22.0 aiohttp 3.8.5 aiosignal 1.3.1 anyio 4.0.0 appdirs 1.4.4 asgiref 3.7.2 async-timeout 4.0.3 attrs 23.1.0 beautifulsoup4 4.12.2 bentoml 1.1.6 bitsandbytes 0.41.1 boto3 1.28.43 botocore 1.31.43 bpemb 0.3.4 build 1.0.3 cattrs 23.1.2 certifi 2023.7.22 charset-normalizer 3.2.0 circus 0.18.0 click 8.1.7 click-option-group 0.5.6 cloudpickle 2.2.1 colorama 0.4.6 coloredlogs 15.0.1 conllu 4.5.3 contextlib2 21.6.0 contourpy 1.1.0 cuda-python 12.2.0 cycler 0.11.0 Cython 3.0.2 datasets 2.14.5 deepmerge 1.1.0 Deprecated 1.2.14 dill 0.3.7 diskcache 5.6.3 emoji 2.8.0 exceptiongroup 1.1.3 fairscale 0.4.13 fastcore 1.5.29 filelock 3.12.3 filetype 1.2.0 flair 0.12.2 fonttools 4.42.1 frozenlist 1.4.0 fs 2.4.16 fsspec 2023.6.0 ftfy 6.1.1 future 0.18.3 gdown 4.4.0 gensim 4.3.2 ghapi 1.0.4 h11 0.14.0 httpcore 0.17.3 httpx 0.24.1 huggingface-hub 0.16.4 humanfriendly 10.0 hyperopt 0.2.7 idna 3.4 importlib-metadata 6.0.1 importlib-resources 6.0.1 inflection 0.5.1 Janome 0.5.0 Jinja2 3.1.2 jmespath 1.0.1 joblib 1.3.2 kiwisolver 1.4.5 langdetect 1.0.9 llama-cpp-python 0.1.83 lxml 4.9.3 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.7.2 mdurl 0.1.2 more-itertools 10.1.0 mpld3 0.3 mpmath 1.3.0 multidict 6.0.4 multiprocess 0.70.15 mypy-extensions 1.0.0 nereval 0.2.5 nervaluate 0.1.8 networkx 3.1 numpy 1.24.4 openllm 0.3.3 openllm-client 0.3.3 openllm-core 0.3.3 opentelemetry-api 1.18.0 opentelemetry-instrumentation 0.39b0 opentelemetry-instrumentation-aiohttp-client 0.39b0 opentelemetry-instrumentation-asgi 0.39b0 opentelemetry-sdk 1.18.0 opentelemetry-semantic-conventions 0.39b0 opentelemetry-util-http 0.39b0 optimum 1.13.0 orjson 3.9.6 packaging 23.1 pandas 2.0.3 pathspec 0.11.2 Pillow 10.0.0 pip 22.3.1 pip-requirements-parser 32.0.1 pip-tools 7.3.0 pptree 3.1 prometheus-client 0.17.1 protobuf 4.24.2 psutil 5.9.5 py4j 0.10.9.7 pyarrow 13.0.0 pydantic 1.10.12 Pygments 2.16.1 pynvml 11.5.0 pyparsing 3.0.9 pyproject_hooks 1.0.0 pyreadline3 3.4.1 PySocks 1.7.1 python-dateutil 2.8.2 python-json-logger 2.0.7 python-multipart 0.0.6 pytorch_revgrad 0.2.0 pytz 2023.3.post1 PyYAML 6.0.1 pyzmq 25.1.1 regex 2023.8.8 requests 2.31.0 rich 13.5.2 s3transfer 0.6.2 safetensors 0.3.3 schema 0.7.5 scikit-learn 1.3.0 scipy 1.10.1 segtok 1.5.11 sentencepiece 0.1.99 setuptools 65.5.1 simphile 1.0.2 simple-di 0.1.5 six 1.16.0 smart-open 6.4.0 sniffio 1.3.0 soupsieve 2.5 sqlitedict 2.1.0 starlette 0.31.1 sympy 1.12 tabulate 0.9.0 threadpoolctl 3.2.0 tokenizers 0.13.3 tomli 2.0.1 torch 2.0.1 Unidecode 1.3.6 urllib3 1.26.16 uvicorn 0.23.2 watchfiles 0.20.0 wcwidth 0.2.6 wheel 0.38.4 Wikipedia-API 0.6.0 wrapt 1.15.0 xxhash 3.3.0 yarl 1.9.2 zipp 3.16.2

System information (Optional)

No response

Sep 09 '23 09:09 vvvlll93

I second this, running on wsl2 Ubuntu as well under close-enough conditions that I believe this to be the same problem. It only occurs to me when running heavier models, models such as OPT don't give me this error, but models such as Falcon (and llama) do. (edit) Even passing the timeout thought the Python API results in the same error.

Sep 12 '23 02:09 analog-wizard

I met the same problem when running vicuna-33b model, not problem with vicuna-13b.

Sep 14 '23 06:09 oppokui

I'm having the same problem when running flan-t5-xl, but not with flan-t5-large. I think it has to do with the size of the model. My setup is similar - running the model from wsl2 and the query from a win 10 python script.