lmdeploy
lmdeploy copied to clipboard
[Bug] qwen1.5-14b-awq服务部署error
Checklist
- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
Describe the bug
使用lmdeploy lite auto_awq生成awq模型; 然后使用tp=2部署服务,会出现如下error:
2024-04-30 08:25:44,179 - lmdeploy - INFO - input backend=turbomind, backend_config=TurbomindEngineConfig(model_name='qwen', model_format='awq', tp=2, session_len=16384, max_batch_size=4, cache_max_entry_count=0.8, cache_block_seq_len=64, quant_policy=8, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1) 2024-04-30 08:25:44,179 - lmdeploy - INFO - input chat_template_config=None 2024-04-30 08:25:45,243 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='qwen', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None) 2024-04-30 08:25:45,243 - lmdeploy - WARNING - model_source: hf_model 2024-04-30 08:25:45,243 - lmdeploy - WARNING - model_name is deprecated in TurbomindEngineConfig and has no effect Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-04-30 08:25:46,965 - lmdeploy - WARNING - model_config:
[llama] model_name = qwen tensor_para_size = 2 head_num = 40 kv_head_num = 40 vocab_size = 152064 num_layer = 40 inter_size = 13696 norm_eps = 1e-06 attn_bias = 1 start_id = 151643 end_id = 151645 session_len = 16384 weight_type = int4 rotary_embedding = 128 rope_theta = 1000000.0 size_per_head = 128 group_size = 128 max_batch_size = 4 max_context_token_num = 1 step_length = 1 cache_max_entry_count = 0.8 cache_block_seq_len = 64 cache_chunk_size = -1 num_tokens_per_iter = 8192 max_prefill_iters = 2 extra_tokens_per_iter = 0 use_context_fmha = 1 quant_policy = 8 max_position_embeddings = 32768 rope_scaling_factor = 0.0 use_dynamic_ntk = 0 use_logn_attn = 0 lora_policy = lora_r = 0 lora_scale = 0.0 lora_max_wo_r = 0 lora_rank_pattern = lora_scale_pattern =
[TM][INFO] Set logger level by INFO
[TM][WARNING] [LlamaTritonModel] max_context_token_num
= 16384.
[TM][INFO] Set logger level by INFO
[TM][INFO] Set logger level by INFO
2024-04-30 08:25:47,770 - lmdeploy - WARNING - get 883 model params
Convert to turbomind format: 0%| | 0/40 [00:00<?, ?it/s]Traceback (most recent call last):
File "/opt/py38/bin/lmdeploy", line 11, in
Reproduction
lmdeploy serve api_server ../pretrained-models/qwen1.5-14b-chat-w4-lmdeploy/ --backend turbomind --model-format awq --log-level INFO --tp 2 --quant-policy 8
Environment
使用docker镜像:0.4.0
Error traceback
No response
tp=1部署服务运行正常
please help!
assert tensor.shape[split_dim] % tp == 0
量化参数的tensor shape不能被tp整除,无法tp
assert tensor.shape[split_dim] % tp == 0
量化参数的tensor shape不能被tp整除,无法tp
碰到同样的问题了。请问要怎么解决呢
现在还没有解决