OpenLLM icon indicating copy to clipboard operation
OpenLLM copied to clipboard

bug: can't load GPTQ quantized model

Open BEpresent opened this issue 2 years ago • 2 comments

Describe the bug

I try to run one of TheBloke's quantized models on an A100 40GB. It is not one of the most recent models

To reproduce

openllm start llama --model-id TheBloke/WizardLM-33B-V1-0-Uncensored-SuperHOT-8K-GPTQ --quantize gptq

However I get the following error:

2023-09-28T13:40:58+0000 [ERROR] [runner:llm-llama-runner:1] An exception occurred while instantiating runner 'llm-llama-runner', see details below:
2023-09-28T13:40:58+0000 [ERROR] [runner:llm-llama-runner:1] Traceback (most recent call last):
  File "/home/be/.local/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
    self._set_handle(LocalRunnerRef)
  File "/home/be/.local/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
    runner_handle = handle_class(self, *args, **kwargs)
  File "/home/be/.local/lib/python3.9/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in __init__
    self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
  File "/home/be/.local/lib/python3.9/site-packages/openllm/_llm.py", line 1166, in __init__
    if not self.model: raise RuntimeError('Failed to load the model correctly (See traceback above)')
  File "/home/be/.local/lib/python3.9/site-packages/openllm/_llm.py", line 748, in model
    model = self.load_model(*self._model_decls, **self._model_attrs)
  File "/home/be/.local/lib/python3.9/site-packages/openllm/_assign.py", line 71, in inner
    return fn(self, *decls, **attrs)
  File "/home/be/.local/lib/python3.9/site-packages/openllm/serialisation/__init__.py", line 75, in caller
    return getattr(importlib.import_module(f'.{serde}', __name__), fn)(llm, *args, **kwargs)
  File "/home/be/.local/lib/python3.9/site-packages/openllm/serialisation/transformers/__init__.py", line 182, in load_model
    model = auto_class.from_pretrained(llm._bentomodel.path, *decls, config=config, trust_remote_code=llm.trust_remote_code, device_map=device_map, **hub_attrs, **attrs).eval()
  File "/home/be/.local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
    return model_class.from_pretrained(
  File "/home/be/.local/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2556, in from_pretrained
    quantization_method_from_config = config.quantization_config.get(
AttributeError: 'GPTQConfig' object has no attribute 'get'

2023-09-28T13:40:58+0000 [ERROR] [runner:llm-llama-runner:1] Traceback (most recent call last):
  File "/home/be/.local/lib/python3.9/site-packages/starlette/routing.py", line 677, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/lib/python3.9/contextlib.py", line 175, in __aenter__
    return await self.gen.__anext__()
  File "/home/be/.local/lib/python3.9/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan
    on_startup()
  File "/home/be/.local/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 317, in init_local
    raise e
  File "/home/be/.local/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
    self._set_handle(LocalRunnerRef)
  File "/home/be/.local/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
    runner_handle = handle_class(self, *args, **kwargs)
  File "/home/be/.local/lib/python3.9/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in __init__
    self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
  File "/home/be/.local/lib/python3.9/site-packages/openllm/_llm.py", line 1166, in __init__
    if not self.model: raise RuntimeError('Failed to load the model correctly (See traceback above)')
  File "/home/be/.local/lib/python3.9/site-packages/openllm/_llm.py", line 748, in model
    model = self.load_model(*self._model_decls, **self._model_attrs)
  File "/home/be/.local/lib/python3.9/site-packages/openllm/_assign.py", line 71, in inner
    return fn(self, *decls, **attrs)
  File "/home/be/.local/lib/python3.9/site-packages/openllm/serialisation/__init__.py", line 75, in caller
    return getattr(importlib.import_module(f'.{serde}', __name__), fn)(llm, *args, **kwargs)
  File "/home/be/.local/lib/python3.9/site-packages/openllm/serialisation/transformers/__init__.py", line 182, in load_model
    model = auto_class.from_pretrained(llm._bentomodel.path, *decls, config=config, trust_remote_code=llm.trust_remote_code, device_map=device_map, **hub_attrs, **attrs).eval()
  File "/home/be/.local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
    return model_class.from_pretrained(
  File "/home/be/.local/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2556, in from_pretrained
    quantization_method_from_config = config.quantization_config.get(
AttributeError: 'GPTQConfig' object has no attribute 'get'

Environment

System information

bentoml: 1.1.6 python: 3.9.2 platform: Linux-5.10.0-23-cloud-amd64-x86_64-with-glibc2.31 uid_gid: 1000:1001 conda: 23.5.0 in_conda_env: True

name: base channels:

  • defaults dependencies:
  • _libgcc_mutex=0.1=main
  • _openmp_mutex=5.1=1_gnu
  • boltons=23.0.0=py310h06a4308_0
  • brotlipy=0.7.0=py310h7f8727e_1002
  • bzip2=1.0.8=h7b6447c_0
  • ca-certificates=2023.01.10=h06a4308_0
  • certifi=2023.5.7=py310h06a4308_0
  • cffi=1.15.1=py310h5eee18b_3
  • charset-normalizer=2.0.4=pyhd3eb1b0_0
  • conda=23.5.0=py310h06a4308_0
  • conda-content-trust=0.1.3=py310h06a4308_0
  • conda-package-handling=2.1.0=py310h06a4308_0
  • conda-package-streaming=0.8.0=py310h06a4308_0
  • cryptography=39.0.1=py310h9ce1e76_0
  • jsonpatch=1.32=pyhd3eb1b0_0
  • jsonpointer=2.1=pyhd3eb1b0_0
  • ld_impl_linux-64=2.38=h1181459_1
  • libffi=3.4.4=h6a678d5_0
  • libgcc-ng=11.2.0=h1234567_1
  • libgomp=11.2.0=h1234567_1
  • libstdcxx-ng=11.2.0=h1234567_1
  • libuuid=1.41.5=h5eee18b_0
  • ncurses=6.4=h6a678d5_0
  • openssl=1.1.1t=h7f8727e_0
  • packaging=23.0=py310h06a4308_0
  • pip=23.0.1=py310h06a4308_0
  • pluggy=1.0.0=py310h06a4308_1
  • pycosat=0.6.4=py310h5eee18b_0
  • pycparser=2.21=pyhd3eb1b0_0
  • pyopenssl=23.0.0=py310h06a4308_0
  • pysocks=1.7.1=py310h06a4308_0
  • python=3.10.10=h7a1cb2a_2
  • readline=8.2=h5eee18b_0
  • ruamel.yaml=0.17.21=py310h5eee18b_0
  • ruamel.yaml.clib=0.2.6=py310h5eee18b_1
  • setuptools=65.6.3=py310h06a4308_0
  • six=1.16.0=pyhd3eb1b0_1
  • sqlite=3.41.2=h5eee18b_0
  • tk=8.6.12=h1ccaba5_0
  • toolz=0.12.0=py310h06a4308_0
  • tqdm=4.65.0=py310h2f386ee_0
  • urllib3=1.26.15=py310h06a4308_0
  • wheel=0.38.4=py310h06a4308_0
  • xz=5.4.2=h5eee18b_0
  • zlib=1.2.13=h5eee18b_0
  • zstandard=0.19.0=py310h5eee18b_0
  • pip:
    • absl-py==1.4.0
    • accelerate==0.21.0
    • addict==2.4.0
    • aenum==3.1.12
    • aiofiles==23.1.0
    • aiohttp==3.8.4
    • aiosignal==1.3.1
    • altair==5.0.1
    • annotated-types==0.5.0
    • antlr4-python3-runtime==4.9.3
    • anyio==3.7.0
    • appdirs==1.4.4
    • asgiref==3.7.2
    • async-timeout==4.0.2
    • attrs==23.1.0
    • basicsr==1.4.2
    • beautifulsoup4==4.12.2
    • bentoml==1.1.0
    • blendmodes==2022
    • build==0.10.0
    • cachetools==5.3.1
    • cattrs==23.1.2
    • chardet==4.0.0
    • circus==0.18.0
    • clean-fid==0.1.29
    • click==8.1.3
    • click-option-group==0.5.6
    • clip==1.0
    • cloudpickle==2.2.1
    • cmake==3.26.4
    • compel==2.0.1
    • contextlib2==21.6.0
    • contourpy==1.0.7
    • controlnet-aux==0.0.6
    • cssselect2==0.7.0
    • cycler==0.11.0
    • deepmerge==1.1.0
    • deprecated==1.2.14
    • deprecation==2.1.0
    • diffusers==0.19.3
    • einops==0.4.1
    • exceptiongroup==1.1.1
    • facexlib==0.3.0
    • fastapi==0.100.1
    • ffmpy==0.3.0
    • filelock==3.12.2
    • filterpy==1.4.5
    • flatbuffers==23.5.26
    • font-roboto==0.0.1
    • fonts==0.0.3
    • fonttools==4.40.0
    • frozenlist==1.3.3
    • fs==2.4.16
    • fsspec==2023.6.0
    • ftfy==6.1.1
    • future==0.18.3
    • fvcore==0.1.5.post20221221
    • gdown==4.7.1
    • gfpgan==1.3.8
    • gitdb==4.0.10
    • gitpython==3.1.30
    • google-auth==2.19.1
    • google-auth-oauthlib==1.0.0
    • gradio==3.28.1
    • gradio-client==0.2.6
    • grpcio==1.54.2
    • h11==0.12.0
    • httpcore==0.15.0
    • httpx==0.24.1
    • huggingface-hub==0.15.1
    • idna==2.10
    • imageio==2.31.1
    • importlib-metadata==6.0.1
    • inflection==0.5.1
    • invisible-watermark==0.2.0
    • iopath==0.1.9
    • jinja2==3.1.2
    • jsonmerge==1.8.0
    • jsonschema==4.17.3
    • kiwisolver==1.4.4
    • kornia==0.6.7
    • lark==1.1.2
    • lazy-loader==0.2
    • lightning-utilities==0.8.0
    • linkify-it-py==2.0.2
    • lit==16.0.5.post0
    • llvmlite==0.40.1rc1
    • lmdb==1.4.1
    • lpips==0.1.4
    • lxml==4.9.2
    • markdown==3.4.3
    • markdown-it-py==2.2.0
    • markupsafe==2.1.3
    • matplotlib==3.7.1
    • mdit-py-plugins==0.3.3
    • mdurl==0.1.2
    • mediapipe==0.10.1
    • mpmath==1.3.0
    • multidict==6.0.4
    • mypy-extensions==1.0.0
    • networkx==3.1
    • numba==0.57.1
    • numpy==1.24.4
    • nvidia-cublas-cu11==11.10.3.66
    • nvidia-cuda-cupti-cu11==11.7.101
    • nvidia-cuda-nvrtc-cu11==11.7.99
    • nvidia-cuda-runtime-cu11==11.7.99
    • nvidia-cudnn-cu11==8.5.0.96
    • nvidia-cufft-cu11==10.9.0.58
    • nvidia-curand-cu11==10.2.10.91
    • nvidia-cusolver-cu11==11.4.0.1
    • nvidia-cusparse-cu11==11.7.4.91
    • nvidia-nccl-cu11==2.14.3
    • nvidia-nvtx-cu11==11.7.91
    • oauthlib==3.2.2
    • omegaconf==2.2.3
    • open-clip-torch==2.7.0
    • opencv-contrib-python==4.7.0.72
    • opencv-python==4.7.0.72
    • opentelemetry-api==1.18.0
    • opentelemetry-instrumentation==0.39b0
    • opentelemetry-instrumentation-aiohttp-client==0.39b0
    • opentelemetry-instrumentation-asgi==0.39b0
    • opentelemetry-sdk==1.18.0
    • opentelemetry-semantic-conventions==0.39b0
    • opentelemetry-util-http==0.39b0
    • orjson==3.9.1
    • pandas==2.0.2
    • pathspec==0.11.1
    • piexif==1.1.3
    • pillow==9.4.0
    • pip-requirements-parser==32.0.1
    • pip-tools==6.13.0
    • platformdirs==3.5.3
    • portalocker==2.7.0
    • prometheus-client==0.17.0
    • protobuf==3.20.0
    • psutil==5.9.5
    • pyasn1==0.5.0
    • pyasn1-modules==0.3.0
    • pydantic==2.1.1
    • pydantic-core==2.4.0
    • pydub==0.25.1
    • pygments==2.15.1
    • pynvml==11.5.0
    • pyparsing==3.0.9
    • pyproject-hooks==1.0.0
    • pyre-extensions==0.0.29
    • pyrsistent==0.19.3
    • python-dateutil==2.8.2
    • python-json-logger==2.0.7
    • python-multipart==0.0.6
    • pytorch-lightning==1.9.4
    • pytz==2023.3
    • pywavelets==1.4.1
    • pyyaml==6.0
    • pyzmq==25.1.0
    • realesrgan==0.3.0
    • regex==2023.6.3
    • reportlab==4.0.4
    • requests==2.25.1
    • requests-oauthlib==1.3.1
    • resize-right==0.0.2
    • rich==13.4.2
    • rsa==4.9
    • safetensors==0.3.1
    • schema==0.7.5
    • scikit-image==0.19.2
    • scipy==1.10.1
    • semantic-version==2.10.0
    • sentencepiece==0.1.99
    • simple-di==0.1.5
    • smmap==5.0.0
    • sniffio==1.3.0
    • sounddevice==0.4.6
    • soupsieve==2.4.1
    • starlette==0.27.0
    • svglib==1.5.1
    • sympy==1.12
    • tabulate==0.9.0
    • tb-nightly==2.14.0a20230613
    • tensorboard-data-server==0.7.0
    • termcolor==2.3.0
    • tifffile==2023.4.12
    • timm==0.6.7
    • tinycss2==1.2.1
    • tokenizers==0.13.3
    • tomesd==0.1.3
    • tomli==2.0.1
    • torch==2.0.1
    • torchdiffeq==0.2.3
    • torchmetrics==0.11.4
    • torchsde==0.2.5
    • torchvision==0.15.2
    • tornado==6.3.2
    • trampoline==0.1.2
    • transformers==4.31.0
    • triton==2.0.0
    • typing-extensions==4.6.3
    • typing-inspect==0.9.0
    • tzdata==2023.3
    • uc-micro-py==1.0.2
    • uvicorn==0.22.0
    • watchfiles==0.19.0
    • wcwidth==0.2.6
    • webencodings==0.5.1
    • websockets==11.0.3
    • werkzeug==2.3.6
    • wrapt==1.15.0
    • xformers==0.0.20
    • yacs==0.1.8
    • yapf==0.40.0
    • yarl==1.9.2
    • zipp==3.15.0

BEpresent avatar Sep 28 '23 13:09 BEpresent

I'm trying to run "TheBloke/Llama-2-13B-chat-GPTQ" using version 0.3.6 and I get the same error:

2023-10-13T09:36:44+0300 [ERROR] [runner:llm-llama-runner:1] Traceback (most recent call last): File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/starlette/routing.py", line 705, in lifespan async with self.lifespan_context(app) as maybe_state: File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/contextlib.py", line 181, in aenter return await self.gen.anext() File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan on_startup() File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 317, in init_local raise e File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local self._set_handle(LocalRunnerRef) File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle runner_handle = handle_class(self, *args, **kwargs) File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in init self._runnable = runner.runnable_class(**runner.runnable_init_params) # type: ignore File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/openllm/_llm.py", line 1166, in init if not self.model: raise RuntimeError('Failed to load the model correctly (See traceback above)') File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/openllm/_llm.py", line 748, in model model = self.load_model(*self._model_decls, **self._model_attrs) File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/openllm/_assign.py", line 71, in inner return fn(self, *decls, **attrs) File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/openllm/serialisation/init.py", line 75, in caller return getattr(importlib.import_module(f'.{serde}', name), fn)(llm, *args, **kwargs) File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/openllm/serialisation/transformers/init.py", line 182, in load_model model = auto_class.from_pretrained(llm._bentomodel.path, *decls, config=config, trust_remote_code=llm.trust_remote_code, device_map=device_map, **hub_attrs, **attrs).eval() File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained return model_class.from_pretrained( File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2683, in from_pretrained quantization_method_from_config = config.quantization_config.get( AttributeError: 'GPTQConfig' object has no attribute 'get'

I wonder whether this is related to the models being not quite recent ones? (In light of the previous comment)

soydan avatar Oct 13 '23 06:10 soydan

I wonder whether this is related to the models being not quite recent ones? (In light of the previous comment)

This could be - on the TGI repo they mention it could have to do with some old quantization script from TheBloke (different error in TGI, but my guess is it might be similar).

BEpresent avatar Oct 13 '23 08:10 BEpresent