我在使用你们提供的inference_vllm.py进行推理时,发生了以下错误
ERROR 08-19 09:33:57 pynccl.py:53] Failed to load NCCL library from libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
INFO 08-19 09:33:57 pynccl_utils.py:17] Failed to import NCCL library: libnccl.so.2: cannot open shared object file: No such file or directory
INFO 08-19 09:33:57 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
INFO 08-19 09:33:57 llm_engine.py:75] Initializing an LLM engine (v0.4.0) with config: model='/data2/liushuliang/MiniCPM/OpenBMB/MiniCPM-2B-sft-bf16', tokenizer='/data2/liushuliang/MiniCPM/OpenBMB/MiniCPM-2B-sft-bf16', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
INFO 08-19 09:33:58 selector.py:16] Using FlashAttention backend.
Traceback (most recent call last):
File "/data2/liushuliang/MiniCPM/inference/inference_vllm.py", line 43, in
llm = LLM(model=args.model_path, tensor_parallel_size=1, dtype='bfloat16',trust_remote_code=True)
File "/data1/liushuliang/anaconda3/envs/MiniCPM/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 112, in init
self.llm_engine = LLMEngine.from_engine_args(
File "/data1/liushuliang/anaconda3/envs/MiniCPM/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 192, in from_engine_args
engine = cls(
File "/data1/liushuliang/anaconda3/envs/MiniCPM/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 111, in init
self.model_executor = executor_class(model_config, cache_config,
File "/data1/liushuliang/anaconda3/envs/MiniCPM/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 37, in init
self._init_worker()
File "/data1/liushuliang/anaconda3/envs/MiniCPM/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 66, in _init_worker
self.driver_worker.load_model()
File "/data1/liushuliang/anaconda3/envs/MiniCPM/lib/python3.10/site-packages/vllm/worker/worker.py", line 107, in load_model
self.model_runner.load_model()
File "/data1/liushuliang/anaconda3/envs/MiniCPM/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 95, in load_model
self.model = get_model(
File "/data1/liushuliang/anaconda3/envs/MiniCPM/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 54, in get_model
model_class = _get_model_architecture(model_config)[0]
File "/data1/liushuliang/anaconda3/envs/MiniCPM/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 41, in _get_model_architecture
raise ValueError(
ValueError: Model architectures ['MiniCPMForCausalLM'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LlavaForConditionalGeneration', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'OLMoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PhiForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'XverseForCausalLM']
我猜测是我vllm版本太低,我的vllm版本是0.4.0,但是我使用的服务器只能用这个版本的vllm,请问有什么解决办法吗
抱歉,我的cuda版本是11.8,如果可以的话,我可以使用高版本的vllm吗
你好,我看了一下问题,确实是vllm的版本太低的缘故,建议使用更高版本的vllm,11.8的cuda应该是可以安装更高版本的vllm的