ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

GLM-4.1V-9B-Thinking

Open izhaolinger opened this issue 4 months ago • 1 comments

是否支持GLM-4.1V-9B-Thinking的模型呢?我在以下环境进行尝试时出现了异常:

  • Docker Image:intelanalytics/ipex-llm-serving-xpu:0.9.2-b22
  • GPU:Intel A770 * 4

当我执行: python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server --port 8001 --model "/mnt/data/vllm/GLM-4.1V-9B-Thinking/" --served-model-name "GLM-4.1V-9B-Thinking" --trust-remote-code --gpu-memory-utilization "0.95" --device xpu --dtype float16 --enforce-eager --load-in-low-bit fp8 --max-model-len "10000" --max-num-batched-tokens "10000" --max-num-seqs "32" --tensor-parallel-size "2" --pipeline-parallel-size "1" --distributed-executor-backend ray --disable-async-output-proc 时,遇到了以下错误: (WrapperWithLoadBit pid=2840) INFO 08-13 10:52:44 [xpu.py:43] Cannot use None backend on XPU. (WrapperWithLoadBit pid=2840) INFO 08-13 10:52:44 [xpu.py:51] Using IPEX attention backend. (WrapperWithLoadBit pid=2840) INFO 08-13 10:52:44 [parallel_state.py:1076] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1 (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] Error in loading model architecture 'Glm4ForCausalLM' (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] Traceback (most recent call last): (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 366, in _try_load_model_cls (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] return model.load_model_cls() (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] ^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 354, in load_model_cls (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] mod = importlib.import_module(self.module_name) (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "/usr/lib/python3.11/importlib/init.py", line 126, in import_module (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] return _bootstrap._gcd_import(name[level:], package, level) (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "", line 1204, in _gcd_import (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "", line 1176, in _find_and_load (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "", line 1147, in _find_and_load_unlocked (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "", line 690, in _load_unlocked (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "", line 940, in exec_module (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "", line 241, in _call_with_frames_removed (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/glm4.py", line 54, in (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] class Glm4Attention(nn.Module): (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/glm4.py", line 62, in Glm4Attention (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] head_dim: Optional[int] = None, (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] ^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [registry.py:368] NameError: name 'Optional' is not defined ERROR 08-13 10:52:45 [registry.py:368] Error in loading model architecture 'Glm4ForCausalLM' ERROR 08-13 10:52:45 [registry.py:368] Traceback (most recent call last): ERROR 08-13 10:52:45 [registry.py:368] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 366, in _try_load_model_cls ERROR 08-13 10:52:45 [registry.py:368] return model.load_model_cls() ERROR 08-13 10:52:45 [registry.py:368] ^^^^^^^^^^^^^^^^^^^^^^ ERROR 08-13 10:52:45 [registry.py:368] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 354, in load_model_cls ERROR 08-13 10:52:45 [registry.py:368] mod = importlib.import_module(self.module_name) ERROR 08-13 10:52:45 [registry.py:368] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 08-13 10:52:45 [registry.py:368] File "/usr/lib/python3.11/importlib/init.py", line 126, in import_module ERROR 08-13 10:52:45 [registry.py:368] return _bootstrap._gcd_import(name[level:], package, level) ERROR 08-13 10:52:45 [registry.py:368] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 08-13 10:52:45 [registry.py:368] File "", line 1204, in _gcd_import ERROR 08-13 10:52:45 [registry.py:368] File "", line 1176, in _find_and_load ERROR 08-13 10:52:45 [registry.py:368] File "", line 1147, in _find_and_load_unlocked ERROR 08-13 10:52:45 [registry.py:368] File "", line 690, in _load_unlocked ERROR 08-13 10:52:45 [registry.py:368] File "", line 940, in exec_module ERROR 08-13 10:52:45 [registry.py:368] File "", line 241, in _call_with_frames_removed ERROR 08-13 10:52:45 [registry.py:368] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/glm4.py", line 54, in ERROR 08-13 10:52:45 [registry.py:368] class Glm4Attention(nn.Module): ERROR 08-13 10:52:45 [registry.py:368] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/glm4.py", line 62, in Glm4Attention ERROR 08-13 10:52:45 [registry.py:368] head_dim: Optional[int] = None, ERROR 08-13 10:52:45 [registry.py:368] ^^^^^^^^ ERROR 08-13 10:52:45 [registry.py:368] NameError: name 'Optional' is not defined (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] Error executing method 'load_model'. This might cause deadlock in distributed execution. (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] Traceback (most recent call last): (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 614, in execute_method (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] return run_method(self, method, args, kwargs) (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/utils/init.py", line 2738, in run_method (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] return func(*args, **kwargs) (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/worker/worker.py", line 210, in load_model (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] self.model_runner.load_model() (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/llm/ipex-llm/python/llm/src/ipex_llm/vllm/xpu/model_convert.py", line 96, in _ipex_llm_load_model (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] self.model = get_model( (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/model_loader/init.py", line 63, in get_model (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] return loader.load_model(vllm_config=vllm_config, (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/model_loader/base_loader.py", line 38, in load_model (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] model = initialize_model(vllm_config=vllm_config, (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/model_loader/utils.py", line 64, in initialize_model (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] return model_class(vllm_config=vllm_config, prefix=prefix) (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/glm4_1v.py", line 1284, in init (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] self.language_model = init_vllm_registered_model( (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] return initialize_model(vllm_config=vllm_config, prefix=prefix) (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/model_loader/utils.py", line 52, in initialize_model (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] model_class, _ = get_model_architecture(model_config) (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/model_loader/utils.py", line 244, in get_model_architecture (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] model_cls, arch = ModelRegistry.resolve_model_cls(architectures) (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 503, in resolve_model_cls (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] return self._raise_for_unsupported(architectures) (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] File "/usr/local/lib/python3.11/dist-packages/vllm-0.9.2+ipexllm.xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 440, in _raise_for_unsupported (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] raise ValueError( (WrapperWithLoadBit pid=2840) ERROR 08-13 10:52:45 [worker_base.py:622] ValueError: Model architectures ['Glm4ForCausalLM'] failed to be inspected. Please check the logs for more details.

izhaolinger avatar Aug 13 '25 03:08 izhaolinger

目前vllm 0.9.2-b22还不支持跑GLM-4.1V

hzjane avatar Aug 14 '25 01:08 hzjane