OpenLLM bug: Error While loading Mistral and Llama on T4 GPU

Describe the bug

When loading the mistral and Llama model on T4 GPU, I'm getting this error

raise openllm.exceptions.OpenLLMException(f'Failed to initialise vLLMEngine due to the following error:\n{err}') from err openllm_core.exceptions.OpenLLMException: Failed to initialise vLLMEngine due to the following error: Error while deserializing header: MetadataIncompleteBuffer

To reproduce

No response

Logs

No response

Environment

accelerate==0.25.0 aiohttp==3.9.1 aioprometheus==23.3.0 aiosignal==1.3.1 annotated-types==0.6.0 anyio==3.7.1 appdirs==1.4.4 asgiref==3.7.2 async-timeout==4.0.3 attrs==23.1.0 backoff==2.2.1 bentoml==1.1.10 bitsandbytes==0.41.3.post2 build==0.10.0 cattrs==23.1.2 certifi==2023.11.17 charset-normalizer==3.3.2 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 coloredlogs==15.0.1 contextlib2==21.6.0 cuda-python==12.3.0 datasets==2.15.0 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.7 distlib==0.3.8 distro==1.8.0 einops==0.7.0 exceptiongroup==1.2.0 fastapi==0.104.1 fastcore==1.5.29 filelock==3.13.1 filetype==1.2.0 frozenlist==1.4.0 fs==2.4.16 fsspec==2023.10.0 ghapi==1.0.4 h11==0.14.0 httpcore==1.0.2 httptools==0.6.1 httpx==0.25.2 huggingface-hub==0.19.4 humanfriendly==10.0 idna==3.6 importlib-metadata==6.11.0 inflection==0.5.1 Jinja2==3.1.2 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 litellm==1.13.1 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 mpmath==1.3.0 msgpack==1.0.7 multidict==6.0.4 multiprocess==0.70.15 mypy-extensions==1.0.0 networkx==3.2.1 ninja==1.11.1.1 numpy==1.26.2 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==11.525.150 nvidia-nccl-cu12==2.18.1 nvidia-nvjitlink-cu12==12.3.101 nvidia-nvtx-cu12==12.1.105 openai==1.3.9 openllm==0.4.36 openllm-client==0.4.36 openllm-core==0.4.36 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.15.0 orjson==3.9.10 packaging==23.2 pandas==2.1.4 pathspec==0.12.1 Pillow==10.1.0 pip-requirements-parser==32.0.1 pip-tools==7.3.0 platformdirs==4.1.0 prometheus-client==0.19.0 protobuf==4.25.1 psutil==5.9.6 pyarrow==14.0.1 pyarrow-hotfix==0.6 pydantic==1.10.13 pydantic_core==2.14.5 Pygments==2.17.2 pyparsing==3.1.1 pyproject_hooks==1.0.0 python-dateutil==2.8.2 python-dotenv==1.0.0 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.2 quantile-python==1.1 ray==2.8.1 redis==5.0.1 referencing==0.32.0 regex==2023.10.3 requests==2.31.0 rich==13.7.0 rpds-py==0.13.2 rq==1.15.1 safetensors==0.4.1 schema==0.7.5 scipy==1.11.4 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 starlette==0.27.0 sympy==1.12 tiktoken==0.5.2 tokenizers==0.15.0 tomli==2.0.1 torch==2.1.1 tornado==6.4 tqdm==4.66.1 transformers==4.36.0 triton==2.1.0 typing_extensions==4.9.0 tzdata==2023.3 urllib3==2.1.0 uvicorn==0.24.0.post1 uvloop==0.19.0 virtualenv==20.25.0 vllm==0.2.4 watchfiles==0.21.0 websockets==12.0 wrapt==1.16.0 xformers==0.0.23 xxhash==3.4.1 yarl==1.9.4 zipp==3.17.0

System information (Optional)

No response

Dec 13 '23 13:12 sankethgadadinni

I think t4 is too old for mistral 😄

I haven't tested extensively on T4 yet. But Mistral has been tested with a10g, L4, a100.

Dec 13 '23 13:12 aarnphm

@aarnphm I think with dtype float16 it should work.

What about LLama?

Dec 13 '23 15:12 sankethgadadinni

llama should work.

Dec 13 '23 15:12 aarnphm

btw float16 is default.

Dec 13 '23 15:12 aarnphm

@aarnphm No even LLama isn't working.

bfloat16 is default.

Dec 13 '23 15:12 sankethgadadinni

I think t4 is too old now. 😄

Dec 13 '23 17:12 aarnphm

OpenLLM OpenLLM copied to clipboard

bug: Error While loading Mistral and Llama on T4 GPU

Describe the bug

To reproduce

Logs

Environment

System information (Optional)

OpenLLM
OpenLLM copied to clipboard