ms-swift
ms-swift copied to clipboard
微调deepseekcoderv2后推理明显变慢,如何只activate moe的参数?
我的代码:
def load_model(ckpt_dir):
model_type = ModelType.deepseek_v2_lite_chat
template_type = get_default_template_type(model_type)
llm_engine = get_vllm_engine(
model_type, model_id_or_path=ckpt_dir, tensor_parallel_size=torch.cuda.device_count(),
max_model_len=16384, gpu_memory_utilization=0.95, cache_dir='.cache')
llm_engine.generation_config.max_new_tokens = 8192
tokenizer = llm_engine.hf_tokenizer
template = get_template(template_type, tokenizer)
return llm_engine, template
llm_engine, template = load_model(ckpt_dir=ckpt_dir)
responses = inference_vllm(llm_engine, template, query)
这两者应该是没有关联的,推理的时候所有的参数冻结,应该不涉及activate moe参数