OpenLLM
OpenLLM copied to clipboard
bug: Error While loading Mistral and Llama on T4 GPU
Describe the bug
When loading the mistral and Llama model on T4 GPU, I'm getting this error
raise openllm.exceptions.OpenLLMException(f'Failed to initialise vLLMEngine due to the following error:\n{err}') from err openllm_core.exceptions.OpenLLMException: Failed to initialise vLLMEngine due to the following error: Error while deserializing header: MetadataIncompleteBuffer
To reproduce
No response
Logs
No response
Environment
accelerate==0.25.0 aiohttp==3.9.1 aioprometheus==23.3.0 aiosignal==1.3.1 annotated-types==0.6.0 anyio==3.7.1 appdirs==1.4.4 asgiref==3.7.2 async-timeout==4.0.3 attrs==23.1.0 backoff==2.2.1 bentoml==1.1.10 bitsandbytes==0.41.3.post2 build==0.10.0 cattrs==23.1.2 certifi==2023.11.17 charset-normalizer==3.3.2 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 coloredlogs==15.0.1 contextlib2==21.6.0 cuda-python==12.3.0 datasets==2.15.0 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.7 distlib==0.3.8 distro==1.8.0 einops==0.7.0 exceptiongroup==1.2.0 fastapi==0.104.1 fastcore==1.5.29 filelock==3.13.1 filetype==1.2.0 frozenlist==1.4.0 fs==2.4.16 fsspec==2023.10.0 ghapi==1.0.4 h11==0.14.0 httpcore==1.0.2 httptools==0.6.1 httpx==0.25.2 huggingface-hub==0.19.4 humanfriendly==10.0 idna==3.6 importlib-metadata==6.11.0 inflection==0.5.1 Jinja2==3.1.2 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 litellm==1.13.1 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 mpmath==1.3.0 msgpack==1.0.7 multidict==6.0.4 multiprocess==0.70.15 mypy-extensions==1.0.0 networkx==3.2.1 ninja==1.11.1.1 numpy==1.26.2 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==11.525.150 nvidia-nccl-cu12==2.18.1 nvidia-nvjitlink-cu12==12.3.101 nvidia-nvtx-cu12==12.1.105 openai==1.3.9 openllm==0.4.36 openllm-client==0.4.36 openllm-core==0.4.36 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.15.0 orjson==3.9.10 packaging==23.2 pandas==2.1.4 pathspec==0.12.1 Pillow==10.1.0 pip-requirements-parser==32.0.1 pip-tools==7.3.0 platformdirs==4.1.0 prometheus-client==0.19.0 protobuf==4.25.1 psutil==5.9.6 pyarrow==14.0.1 pyarrow-hotfix==0.6 pydantic==1.10.13 pydantic_core==2.14.5 Pygments==2.17.2 pyparsing==3.1.1 pyproject_hooks==1.0.0 python-dateutil==2.8.2 python-dotenv==1.0.0 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.2 quantile-python==1.1 ray==2.8.1 redis==5.0.1 referencing==0.32.0 regex==2023.10.3 requests==2.31.0 rich==13.7.0 rpds-py==0.13.2 rq==1.15.1 safetensors==0.4.1 schema==0.7.5 scipy==1.11.4 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 starlette==0.27.0 sympy==1.12 tiktoken==0.5.2 tokenizers==0.15.0 tomli==2.0.1 torch==2.1.1 tornado==6.4 tqdm==4.66.1 transformers==4.36.0 triton==2.1.0 typing_extensions==4.9.0 tzdata==2023.3 urllib3==2.1.0 uvicorn==0.24.0.post1 uvloop==0.19.0 virtualenv==20.25.0 vllm==0.2.4 watchfiles==0.21.0 websockets==12.0 wrapt==1.16.0 xformers==0.0.23 xxhash==3.4.1 yarl==1.9.4 zipp==3.17.0
System information (Optional)
No response
I think t4 is too old for mistral 😄
I haven't tested extensively on T4 yet. But Mistral has been tested with a10g, L4, a100.
@aarnphm I think with dtype float16 it should work.
What about LLama?
llama should work.
btw float16 is default.
@aarnphm No even LLama isn't working.
bfloat16 is default.
I think t4 is too old now. 😄