[Feature]: deepseek-v2 awq support
🚀 The feature, motivation and pitch
Is the deepseek-v2 AWQ version supported now? When I run it, I get the following error:
[rank0]: File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 135, in pack_params
[rank0]: w1.append(expert.gate_up_proj.weight)
[rank0]: File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1709, in __getattr__
[rank0]: raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[rank0]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'
model: https://huggingface.co/casperhansen/deepseek-coder-v2-instruct-awq
Alternatives
No response
Additional context
No response
+1
+1
+1 same error vLLM==1.5.1
INFO 07-10 09:45:35 llm_engine.py:169] Initializing an LLM engine (v0.5.1) with config: model='/usr/local/models/llm', speculative_config=None, tokenizer='/usr/local/models/llm', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=awq, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=/usr/local/models/llm, use_v2_block_manager=False, enable_prefix_caching=False)
api-1 | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
api-1 | [2024-07-10 09:45:36,415] [WARN] /usr/local/api/chat_router.py(114):__init__: ERROR ChatModel not working
api-1 | [2024-07-10 09:45:36,416] [WARN] /usr/local/api/chat_router.py(115):__init__: 'MergedColumnParallelLinear' object has no attribute 'weight'
api-1 | [2024-07-10 09:45:36,417] [WARN] /usr/local/api/chat_router.py(116):__init__: Traceback (most recent call last):
api-1 | File "/usr/local/api/chat_router.py", line 107, in __init__
api-1 | self.chat_client = ChatLocalVLLM.from_pretraind(model_path=llm_dir, NL="\n"
api-1 | File "/usr/local/api/chat_models/chat_local_vllm.py", line 74, in from_pretraind
api-1 | engine = cls._prepare_vllm(model_path, tensor_parallel_size
api-1 | File "/usr/local/api/chat_models/chat_local_vllm.py", line 125, in _prepare_vllm
api-1 | engine = AsyncLLMEngine.from_engine_args(engine_args)
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 431, in from_engine_args
api-1 | engine = cls(
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 360, in __init__
api-1 | self.engine = self._init_engine(*args, **kwargs)
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 507, in _init_engine
api-1 | return engine_class(*args, **kwargs)
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 243, in __init__
api-1 | self.model_executor = executor_class(
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 128, in __init__
api-1 | super().__init__(model_config, cache_config, parallel_config,
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 42, in __init__
api-1 | self._init_executor()
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 24, in _init_executor
api-1 | self.driver_worker.load_model()
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 133, in load_model
api-1 | self.model_runner.load_model()
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 243, in load_model
api-1 | self.model = get_model(
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py", line 21, in get_model
api-1 | return loader.load_model(model_config=model_config,
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 267, in load_model
api-1 | model = _initialize_model(model_config, self.load_config,
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 104, in _initialize_model
api-1 | return model_class(config=model_config.hf_config,
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 467, in __init__
api-1 | self.model = DeepseekV2Model(config, cache_config, quant_config)
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 429, in __init__
api-1 | self.layers = nn.ModuleList([
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 430, in <listcomp>
api-1 | DeepseekV2DecoderLayer(config,
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 369, in __init__
api-1 | self.mlp = DeepseekV2MoE(config=config, quant_config=quant_config)
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 113, in __init__
api-1 | self.pack_params()
api-1 | File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 135, in pack_params
api-1 | w1.append(expert.gate_up_proj.weight)
api-1 | File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1709, in __getattr__
api-1 | raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
api-1 | AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'. Did you mean: 'qweight'?
same issue
+1
+1
+1
+1
+1
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!