OpenLLM icon indicating copy to clipboard operation
OpenLLM copied to clipboard

bug: Error While loading Mistral and Llama on T4 GPU

Open sankethgadadinni opened this issue 1 year ago • 6 comments

Describe the bug

When loading the mistral and Llama model on T4 GPU, I'm getting this error

raise openllm.exceptions.OpenLLMException(f'Failed to initialise vLLMEngine due to the following error:\n{err}') from err openllm_core.exceptions.OpenLLMException: Failed to initialise vLLMEngine due to the following error: Error while deserializing header: MetadataIncompleteBuffer

To reproduce

No response

Logs

No response

Environment

accelerate==0.25.0 aiohttp==3.9.1 aioprometheus==23.3.0 aiosignal==1.3.1 annotated-types==0.6.0 anyio==3.7.1 appdirs==1.4.4 asgiref==3.7.2 async-timeout==4.0.3 attrs==23.1.0 backoff==2.2.1 bentoml==1.1.10 bitsandbytes==0.41.3.post2 build==0.10.0 cattrs==23.1.2 certifi==2023.11.17 charset-normalizer==3.3.2 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 coloredlogs==15.0.1 contextlib2==21.6.0 cuda-python==12.3.0 datasets==2.15.0 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.7 distlib==0.3.8 distro==1.8.0 einops==0.7.0 exceptiongroup==1.2.0 fastapi==0.104.1 fastcore==1.5.29 filelock==3.13.1 filetype==1.2.0 frozenlist==1.4.0 fs==2.4.16 fsspec==2023.10.0 ghapi==1.0.4 h11==0.14.0 httpcore==1.0.2 httptools==0.6.1 httpx==0.25.2 huggingface-hub==0.19.4 humanfriendly==10.0 idna==3.6 importlib-metadata==6.11.0 inflection==0.5.1 Jinja2==3.1.2 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 litellm==1.13.1 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 mpmath==1.3.0 msgpack==1.0.7 multidict==6.0.4 multiprocess==0.70.15 mypy-extensions==1.0.0 networkx==3.2.1 ninja==1.11.1.1 numpy==1.26.2 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==11.525.150 nvidia-nccl-cu12==2.18.1 nvidia-nvjitlink-cu12==12.3.101 nvidia-nvtx-cu12==12.1.105 openai==1.3.9 openllm==0.4.36 openllm-client==0.4.36 openllm-core==0.4.36 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.15.0 orjson==3.9.10 packaging==23.2 pandas==2.1.4 pathspec==0.12.1 Pillow==10.1.0 pip-requirements-parser==32.0.1 pip-tools==7.3.0 platformdirs==4.1.0 prometheus-client==0.19.0 protobuf==4.25.1 psutil==5.9.6 pyarrow==14.0.1 pyarrow-hotfix==0.6 pydantic==1.10.13 pydantic_core==2.14.5 Pygments==2.17.2 pyparsing==3.1.1 pyproject_hooks==1.0.0 python-dateutil==2.8.2 python-dotenv==1.0.0 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.2 quantile-python==1.1 ray==2.8.1 redis==5.0.1 referencing==0.32.0 regex==2023.10.3 requests==2.31.0 rich==13.7.0 rpds-py==0.13.2 rq==1.15.1 safetensors==0.4.1 schema==0.7.5 scipy==1.11.4 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 starlette==0.27.0 sympy==1.12 tiktoken==0.5.2 tokenizers==0.15.0 tomli==2.0.1 torch==2.1.1 tornado==6.4 tqdm==4.66.1 transformers==4.36.0 triton==2.1.0 typing_extensions==4.9.0 tzdata==2023.3 urllib3==2.1.0 uvicorn==0.24.0.post1 uvloop==0.19.0 virtualenv==20.25.0 vllm==0.2.4 watchfiles==0.21.0 websockets==12.0 wrapt==1.16.0 xformers==0.0.23 xxhash==3.4.1 yarl==1.9.4 zipp==3.17.0

System information (Optional)

No response

sankethgadadinni avatar Dec 13 '23 13:12 sankethgadadinni

I think t4 is too old for mistral 😄

I haven't tested extensively on T4 yet. But Mistral has been tested with a10g, L4, a100.

aarnphm avatar Dec 13 '23 13:12 aarnphm

@aarnphm I think with dtype float16 it should work.

What about LLama?

sankethgadadinni avatar Dec 13 '23 15:12 sankethgadadinni

llama should work.

aarnphm avatar Dec 13 '23 15:12 aarnphm

btw float16 is default.

aarnphm avatar Dec 13 '23 15:12 aarnphm

@aarnphm No even LLama isn't working.

bfloat16 is default.

sankethgadadinni avatar Dec 13 '23 15:12 sankethgadadinni

I think t4 is too old now. 😄

aarnphm avatar Dec 13 '23 17:12 aarnphm