[vllm] - AttributeError: '_OpNamespace' '_vllm_fa2_C' object has no attribute 'varlen_fwd'
执行: vllm serve /DATA/disk0/ld/ld_model_pretrain/MiniCPM-o-2_6 --dtype auto --max-model-len 2048 --api-key token-abc123 --gpu_memory_utilization 1 --trust-remote-code
报错如下: ERROR 01-26 11:32:37 engine.py:380] File "/autodl-fs/data/github/vllm/vllm/vllm_flash_attn/flash_attn_interface.py", line 154, in flash_attn_varlen_func ERROR 01-26 11:32:37 engine.py:380] out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd( AttributeError: '_OpNamespace' '_vllm_fa2_C' object has no attribute 'varlen_fwd'
官网的教程竟然也会报错! 官网的教程竟然也会报错! 官网的教程竟然也会报错!
这看起来这报错和我们对 MiniCPM-o-2_6 的前端适配没有关系...可能需要检查一下 CUDA 的版本。 另外我们现在的 MiniCPM-o-2_6 已经合进了vllm官方的仓库中,你可以拿下来再试一下,如果还是有报错,我可以帮你去vllm官方提issue看看。
@HwwwwwwwH I'm also facing the same Issue with your fork.
Edit 2:
✅ ✅ ✅ I updated offical vLLM to version 0.7.1 and the above issue was resolved.
I followed the following steps first:
git clone https://github.com/OpenBMB/vllm.git
cd vllm
git checkout minicpmo
python3 -m venv myenv
source myenv/bin/activate
VLLM_USE_PRECOMPILED=1 pip install --editable . --no-cache-dir
export HF_TOKEN=<my-hf-token>
vllm serve openbmb/MiniCPM-o-2_6 \
--trust-remote-code \
--max-model-len 2048 \
--max-num-seq 128
Then I got this error:
File "/home/ubuntu/biraj/vllm/myenv/lib/python3.12/site-packages/torch/_ops.py", line 1225, in __getattr__
raise AttributeError(
AttributeError: '_OpNamespace' '_vllm_fa2_C' object has no attribute 'varlen_fwd'
Here's is the full error trace.
I also tried using official vLLM since I noticed that openbmb/MiniCPM-o-2_6 is listed in vLLM's supported models. However, running vLLM's docker image raised error saying that the model is not supported.
Command:
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<my-hf-token>" \
-p 8000:8000 \
--ipc=host \
-d \
vllm/vllm-openai:latest \
--model openbmb/MiniCPM-o-2_6 \
--trust-remote-code \
--max-model-len 2048 \
--max-num-seq 128
Error:
ValueError: Model architectures ['MiniCPMO'] are not supported for now.
Edit 1: I get the same error for vLLM 0.7.0 without Docker.
~/biraj/minicpmo$ vllm --version
INFO 02-03 04:52:09 __init__.py:183] Automatically detected platform cuda.
0.7.0
Command: Same vllm serve that I mentioned at the begging of this comment.
Error:
ValueError: Model architectures ['MiniCPMO'] are not supported for now.
Edit 2:
I updated vLLM to version 0.7.1 and the above issue was resolved.
这看起来这报错和我们对 MiniCPM-o-2_6 的前端适配没有关系...可能需要检查一下 CUDA 的版本。 另外我们现在的 MiniCPM-o-2_6 已经合进了vllm官方的仓库中,你可以拿下来再试一下,如果还是有报错,我可以帮你去vllm官方提issue看看。 你好,我使用了最新版本的vllm==0.7.1,然后运行时候报了另一个错:AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder'
这看起来这报错和我们对 MiniCPM-o-2_6 的前端适配没有关系...可能需要检查一下 CUDA 的版本。 另外我们现在的 MiniCPM-o-2_6 已经合进了vllm官方的仓库中,你可以拿下来再试一下,如果还是有报错,我可以帮你去vllm官方提issue看看。 你好,我使用了最新版本的vllm==0.7.1,然后运行时候报了另一个错:AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder'
还有更多报错的 traceback 吗?
这看起来这报错和我们对 MiniCPM-o-2_6 的前端适配没有关系...可能需要检查一下 CUDA 的版本。 另外我们现在的 MiniCPM-o-2_6 已经合进了vllm官方的仓库中,你可以拿下来再试一下,如果还是有报错,我可以帮你去vllm官方提issue看看。 你好,我使用了最新版本的vllm==0.7.1,然后运行时候报了另一个错:AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder'
还有更多报错的 traceback 吗?
您好,以下是具体的报错的traceback: [rank0]: Traceback (most recent call last): [rank0]: File "/U03/syj/test.py", line 10, in [rank0]: llm = LLM( [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/utils.py", line 1039, in inner [rank0]: return fn(*args, **kwargs) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 240, in init [rank0]: self.llm_engine = self.engine_class.from_engine_args( [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 482, in from_engine_args [rank0]: engine = cls( [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 274, in init [rank0]: self._initialize_kv_caches() [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 414, in _initialize_kv_caches [rank0]: self.model_executor.determine_num_available_blocks()) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 99, in determine_num_available_blocks [rank0]: results = self.collective_rpc("determine_num_available_blocks") [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 49, in collective_rpc [rank0]: answer = run_method(self.driver_worker, method, args, kwargs) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/utils.py", line 2208, in run_method [rank0]: return func(*args, **kwargs) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/worker/worker.py", line 228, in determine_num_available_blocks [rank0]: self.model_runner.profile_run() [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1236, in profile_run [rank0]: self._dummy_run(max_num_batched_tokens, max_num_seqs) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1301, in _dummy_run [rank0]: .dummy_data_for_profiling(self.model_config, [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/inputs/registry.py", line 333, in dummy_data_for_profiling [rank0]: dummy_data = profiler.get_dummy_data(seq_len) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 161, in get_dummy_data [rank0]: mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/multimodal/profiling.py", line 139, in _get_dummy_mm_inputs [rank0]: return self.processor.apply( [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/model_executor/models/minicpmv.py", line 803, in apply [rank0]: result = super().apply(prompt, mm_data, hf_processor_mm_kwargs) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/multimodal/processing.py", line 1230, in apply [rank0]: hf_mm_placeholders = self._find_mm_placeholders( [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/multimodal/processing.py", line 793, in _find_mm_placeholders [rank0]: return find_mm_placeholders(mm_prompt_repls, new_token_ids, [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/multimodal/processing.py", line 579, in find_mm_placeholders [rank0]: return dict(full_groupby_modality(it)) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/multimodal/processing.py", line 184, in full_groupby_modality [rank0]: return full_groupby(values, key=lambda x: x.modality) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/utils.py", line 873, in full_groupby [rank0]: for value in values: [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/multimodal/processing.py", line 534, in _iter_placeholders [rank0]: replacement = repl_info.get_replacement(item_idx) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/multimodal/processing.py", line 270, in get_replacement [rank0]: replacement = replacement(item_idx) [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/model_executor/models/minicpmo.py", line 355, in get_replacement_minicpmv [rank0]: return self.get_audio_prompt_texts( [rank0]: File "/root/miniconda3/envs/minicpm-o/lib/python3.10/site-packages/vllm/model_executor/models/minicpmo.py", line 232, in get_audio_prompt_texts [rank0]: return self.info.get_hf_processor().get_audio_placeholder( [rank0]: AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder'
可以更新一下最新HF仓库的代码。
可以更新一下最新HF仓库的代码。
你好,解决了谢谢~