OpenLLM
OpenLLM copied to clipboard
bug: OpenLLM query in WSL failed with a timeout
Describe the bug
Hi,
When trying to query the meta-llama/Llama-2-7b-chat-hf with a simple query (i.e. 'Hello'), the query failed with a timeout.
Just to be clear, i launched in a wsl session using the server openllm start ... command, and in another wsl session i tried to queried it using the openllm query ... command.
I tried to add the flag --timeout 300 to both the start command (i.e. openllm start llama --model-id meta... --timeout 300) and the query command (i.e. openllm query --timeout 300), but it doesn't seems to be taken into account as the query will fail after 30 seconds (which i believe is the default settings).
I don't really get why this failed, because the logs are really not helping (even with the --debug flag). I don't know, maybe the model is to big or something.
Do you have any idea why this happens ? Maybe i'm doing something wrong with my wsl session.
To reproduce
No response
Logs
No response
Environment
accelerate 0.22.0 aiohttp 3.8.5 aiosignal 1.3.1 anyio 4.0.0 appdirs 1.4.4 asgiref 3.7.2 async-timeout 4.0.3 attrs 23.1.0 beautifulsoup4 4.12.2 bentoml 1.1.6 bitsandbytes 0.41.1 boto3 1.28.43 botocore 1.31.43 bpemb 0.3.4 build 1.0.3 cattrs 23.1.2 certifi 2023.7.22 charset-normalizer 3.2.0 circus 0.18.0 click 8.1.7 click-option-group 0.5.6 cloudpickle 2.2.1 colorama 0.4.6 coloredlogs 15.0.1 conllu 4.5.3 contextlib2 21.6.0 contourpy 1.1.0 cuda-python 12.2.0 cycler 0.11.0 Cython 3.0.2 datasets 2.14.5 deepmerge 1.1.0 Deprecated 1.2.14 dill 0.3.7 diskcache 5.6.3 emoji 2.8.0 exceptiongroup 1.1.3 fairscale 0.4.13 fastcore 1.5.29 filelock 3.12.3 filetype 1.2.0 flair 0.12.2 fonttools 4.42.1 frozenlist 1.4.0 fs 2.4.16 fsspec 2023.6.0 ftfy 6.1.1 future 0.18.3 gdown 4.4.0 gensim 4.3.2 ghapi 1.0.4 h11 0.14.0 httpcore 0.17.3 httpx 0.24.1 huggingface-hub 0.16.4 humanfriendly 10.0 hyperopt 0.2.7 idna 3.4 importlib-metadata 6.0.1 importlib-resources 6.0.1 inflection 0.5.1 Janome 0.5.0 Jinja2 3.1.2 jmespath 1.0.1 joblib 1.3.2 kiwisolver 1.4.5 langdetect 1.0.9 llama-cpp-python 0.1.83 lxml 4.9.3 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.7.2 mdurl 0.1.2 more-itertools 10.1.0 mpld3 0.3 mpmath 1.3.0 multidict 6.0.4 multiprocess 0.70.15 mypy-extensions 1.0.0 nereval 0.2.5 nervaluate 0.1.8 networkx 3.1 numpy 1.24.4 openllm 0.3.3 openllm-client 0.3.3 openllm-core 0.3.3 opentelemetry-api 1.18.0 opentelemetry-instrumentation 0.39b0 opentelemetry-instrumentation-aiohttp-client 0.39b0 opentelemetry-instrumentation-asgi 0.39b0 opentelemetry-sdk 1.18.0 opentelemetry-semantic-conventions 0.39b0 opentelemetry-util-http 0.39b0 optimum 1.13.0 orjson 3.9.6 packaging 23.1 pandas 2.0.3 pathspec 0.11.2 Pillow 10.0.0 pip 22.3.1 pip-requirements-parser 32.0.1 pip-tools 7.3.0 pptree 3.1 prometheus-client 0.17.1 protobuf 4.24.2 psutil 5.9.5 py4j 0.10.9.7 pyarrow 13.0.0 pydantic 1.10.12 Pygments 2.16.1 pynvml 11.5.0 pyparsing 3.0.9 pyproject_hooks 1.0.0 pyreadline3 3.4.1 PySocks 1.7.1 python-dateutil 2.8.2 python-json-logger 2.0.7 python-multipart 0.0.6 pytorch_revgrad 0.2.0 pytz 2023.3.post1 PyYAML 6.0.1 pyzmq 25.1.1 regex 2023.8.8 requests 2.31.0 rich 13.5.2 s3transfer 0.6.2 safetensors 0.3.3 schema 0.7.5 scikit-learn 1.3.0 scipy 1.10.1 segtok 1.5.11 sentencepiece 0.1.99 setuptools 65.5.1 simphile 1.0.2 simple-di 0.1.5 six 1.16.0 smart-open 6.4.0 sniffio 1.3.0 soupsieve 2.5 sqlitedict 2.1.0 starlette 0.31.1 sympy 1.12 tabulate 0.9.0 threadpoolctl 3.2.0 tokenizers 0.13.3 tomli 2.0.1 torch 2.0.1 Unidecode 1.3.6 urllib3 1.26.16 uvicorn 0.23.2 watchfiles 0.20.0 wcwidth 0.2.6 wheel 0.38.4 Wikipedia-API 0.6.0 wrapt 1.15.0 xxhash 3.3.0 yarl 1.9.2 zipp 3.16.2
System information (Optional)
No response
I second this, running on wsl2 Ubuntu as well under close-enough conditions that I believe this to be the same problem. It only occurs to me when running heavier models, models such as OPT don't give me this error, but models such as Falcon (and llama) do. (edit) Even passing the timeout thought the Python API results in the same error.
I met the same problem when running vicuna-33b model, not problem with vicuna-13b.
I'm having the same problem when running flan-t5-xl, but not with flan-t5-large. I think it has to do with the size of the model. My setup is similar - running the model from wsl2 and the query from a win 10 python script.
Same for me: openllm query 'Can you name the 7 Harry Potter Books' TimeoutError: timed out -> after 30 seconds
Model: openllm start opt --model-id facebook/opt-30b --timeout 400 -> timeout parameter does not do anything
I changed the value self._sock.settimeout to self._sock.settimeout(3000)
in the file
/home/
and then it worked.
Ugh can you try again with the latest change? It sounds like you need to update the socket_timeout var in your shell
Running into the same issue but running in a docker container in wsl2
close for openllm 0.6