[vllm] - Failed to inference MiniCPM-o with vllm
起始日期 | Start Date
No response
实现PR | Implementation PR
No response
相关Issues | Reference Issues
No response
摘要 | Summary
Cannot inference MiniCPM-o with the official vllm guide:
For MiniCPM-o 2.6 Clone our fork of vLLM: git clone https://github.com/OpenBMB/vllm.git cd vllm git checkout minicpmo Install vLLM from source: VLLM_USE_PRECOMPILED=1 pip install --editable . Run MiniCPM-o 2.6 in the same way as the previous models (shown in the following example).
Env:
_libgcc_mutex 0.1 main defaults
_openmp_mutex 5.1 1_gnu defaults
accelerate 1.3.0 pypi_0 pypi
aiohappyeyeballs 2.4.4 pypi_0 pypi
aiohttp 3.11.11 pypi_0 pypi
aiohttp-cors 0.7.0 pypi_0 pypi
aiosignal 1.3.2 pypi_0 pypi
airportsdata 20241001 pypi_0 pypi
annotated-types 0.7.0 pypi_0 pypi
anyio 4.8.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
async-timeout 5.0.1 pypi_0 pypi
attrs 24.3.0 pypi_0 pypi
audioread 3.0.1 pypi_0 pypi
blake3 1.0.2 pypi_0 pypi
bzip2 1.0.8 h5eee18b_6 defaults
ca-certificates 2024.12.31 h06a4308_0 defaults
cachetools 5.5.1 pypi_0 pypi
certifi 2024.12.14 pypi_0 pypi
cffi 1.17.1 pypi_0 pypi
charset-normalizer 3.4.1 pypi_0 pypi
click 8.1.8 pypi_0 pypi
cloudpickle 3.1.1 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
colorful 0.5.6 pypi_0 pypi
compressed-tensors 0.8.1 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
deepspeed 0.15.4 pypi_0 pypi
depyf 0.18.0 pypi_0 pypi
dill 0.3.9 pypi_0 pypi
diskcache 5.6.3 pypi_0 pypi
distlib 0.3.9 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
einops 0.8.0 pypi_0 pypi
einx 0.3.0 pypi_0 pypi
encodec 0.1.1 pypi_0 pypi
exceptiongroup 1.2.2 pypi_0 pypi
fastapi 0.115.7 pypi_0 pypi
filelock 3.17.0 pypi_0 pypi
frozendict 2.4.6 pypi_0 pypi
frozenlist 1.5.0 pypi_0 pypi
fsspec 2024.12.0 pypi_0 pypi
gguf 0.10.0 pypi_0 pypi
google-api-core 2.24.0 pypi_0 pypi
google-auth 2.38.0 pypi_0 pypi
googleapis-common-protos 1.66.0 pypi_0 pypi
grpcio 1.70.0 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
hjson 3.1.0 pypi_0 pypi
httpcore 1.0.7 pypi_0 pypi
httptools 0.6.4 pypi_0 pypi
httpx 0.28.1 pypi_0 pypi
huggingface-hub 0.27.1 pypi_0 pypi
idna 3.10 pypi_0 pypi
importlib-metadata 8.6.1 pypi_0 pypi
iniconfig 2.0.0 pypi_0 pypi
interegular 0.3.3 pypi_0 pypi
jinja2 3.1.5 pypi_0 pypi
jiter 0.8.2 pypi_0 pypi
joblib 1.4.2 pypi_0 pypi
jsonlines 4.0.0 pypi_0 pypi
jsonschema 4.23.0 pypi_0 pypi
jsonschema-specifications 2024.10.1 pypi_0 pypi
lark 1.2.2 pypi_0 pypi
lazy-loader 0.4 pypi_0 pypi
ld_impl_linux-64 2.40 h12ee557_0 defaults
libffi 3.4.4 h6a678d5_1 defaults
libgcc-ng 11.2.0 h1234567_1 defaults
libgomp 11.2.0 h1234567_1 defaults
librosa 0.10.2.post1 pypi_0 pypi
libstdcxx-ng 11.2.0 h1234567_1 defaults
libuuid 1.41.5 h5eee18b_0 defaults
llvmlite 0.44.0 pypi_0 pypi
lm-format-enforcer 0.10.9 pypi_0 pypi
markupsafe 3.0.2 pypi_0 pypi
mistral-common 1.5.2 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msgpack 1.1.0 pypi_0 pypi
msgspec 0.19.0 pypi_0 pypi
multidict 6.1.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
nest-asyncio 1.6.0 pypi_0 pypi
networkx 3.4.2 pypi_0 pypi
ninja 1.11.1.3 pypi_0 pypi
numba 0.61.0 pypi_0 pypi
numpy 1.26.4 pypi_0 pypi
nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi
nvidia-curand-cu12 10.3.5.147 pypi_0 pypi
nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi
nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi
nvidia-ml-py 12.560.30 pypi_0 pypi
nvidia-nccl-cu12 2.21.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
nvidia-nvtx-cu12 12.4.127 pypi_0 pypi
openai 1.60.0 pypi_0 pypi
opencensus 0.11.4 pypi_0 pypi
opencensus-context 0.1.3 pypi_0 pypi
opencv-python-headless 4.11.0.86 pypi_0 pypi
openssl 3.0.15 h5eee18b_0 defaults
outlines 0.1.11 pypi_0 pypi
outlines-core 0.1.26 pypi_0 pypi
packaging 24.2 pypi_0 pypi
partial-json-parser 0.2.1.1.post5 pypi_0 pypi
peft 0.14.0 pypi_0 pypi
pillow 10.4.0 pypi_0 pypi
pip 24.2 py310h06a4308_0 defaults
platformdirs 4.3.6 pypi_0 pypi
pluggy 1.5.0 pypi_0 pypi
pooch 1.8.2 pypi_0 pypi
prometheus-client 0.21.1 pypi_0 pypi
prometheus-fastapi-instrumentator 7.0.2 pypi_0 pypi
propcache 0.2.1 pypi_0 pypi
proto-plus 1.25.0 pypi_0 pypi
protobuf 5.29.3 pypi_0 pypi
psutil 6.1.1 pypi_0 pypi
py-cpuinfo 9.0.0 pypi_0 pypi
py-spy 0.4.0 pypi_0 pypi
pyasn1 0.6.1 pypi_0 pypi
pyasn1-modules 0.4.1 pypi_0 pypi
pybind11 2.13.6 pypi_0 pypi
pycountry 24.6.1 pypi_0 pypi
pycparser 2.22 pypi_0 pypi
pydantic 2.10.6 pypi_0 pypi
pydantic-core 2.27.2 pypi_0 pypi
pytest 8.3.4 pypi_0 pypi
python 3.10.16 he870216_1 defaults
python-dotenv 1.0.1 pypi_0 pypi
pyyaml 6.0.2 pypi_0 pypi
pyzmq 26.2.0 pypi_0 pypi
ray 2.41.0 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
referencing 0.36.1 pypi_0 pypi
regex 2024.11.6 pypi_0 pypi
requests 2.32.3 pypi_0 pypi
rpds-py 0.22.3 pypi_0 pypi
rsa 4.9 pypi_0 pypi
safetensors 0.5.2 pypi_0 pypi
scikit-learn 1.6.1 pypi_0 pypi
scipy 1.15.1 pypi_0 pypi
sentencepiece 0.2.0 pypi_0 pypi
setuptools 75.1.0 py310h06a4308_0 defaults
six 1.17.0 pypi_0 pypi
smart-open 7.1.0 pypi_0 pypi
sniffio 1.3.1 pypi_0 pypi
soundfile 0.13.0 pypi_0 pypi
soxr 0.5.0.post1 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0 defaults
starlette 0.45.2 pypi_0 pypi
sympy 1.13.1 pypi_0 pypi
threadpoolctl 3.5.0 pypi_0 pypi
tiktoken 0.7.0 pypi_0 pypi
tk 8.6.14 h39e8969_0 defaults
tokenizers 0.21.0 pypi_0 pypi
tomli 2.2.1 pypi_0 pypi
torch 2.5.1 pypi_0 pypi
torchaudio 2.3.1 pypi_0 pypi
torchvision 0.20.1 pypi_0 pypi
tqdm 4.67.1 pypi_0 pypi
transformers 4.48.1 pypi_0 pypi
triton 3.1.0 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2025a h04d1e81_0 defaults
urllib3 2.3.0 pypi_0 pypi
uvicorn 0.34.0 pypi_0 pypi
uvloop 0.21.0 pypi_0 pypi
vector-quantize-pytorch 1.21.2 pypi_0 pypi
virtualenv 20.29.1 pypi_0 pypi
vllm 0.1.dev4167+g2756ee8.precompiled pypi_0 pypi
vocos 0.1.0 pypi_0 pypi
watchfiles 1.0.4 pypi_0 pypi
websockets 14.2 pypi_0 pypi
wheel 0.44.0 py310h06a4308_0 defaults
wrapt 1.17.2 pypi_0 pypi
xformers 0.0.28.post3 pypi_0 pypi
xgrammar 0.1.11 dev_0
基本示例 | Basic Example
from transformers import AutoTokenizer
from PIL import Image
from vllm import LLM, SamplingParams
MODEL_NAME = "openbmb/MiniCPM-o-2_6"
# Also available for previous models
# MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
# MODEL_NAME = "HwwwH/MiniCPM-V-2"
image = Image.open("/home/test/image.png").convert("RGB")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
llm = LLM(
model=MODEL_NAME,
trust_remote_code=True,
gpu_memory_utilization=1,
max_model_len=2048
)
messages = [{
"role":
"user",
"content":
# Number of images
"(<image>./</image>)" + \
"\nWhat is the content of this image?"
}]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Single Inference
inputs = {
"prompt": prompt,
"multi_modal_data": {
"image": image
# Multi images, the number of images should be equal to that of `(<image>./</image>)`
# "image": [image, image]
},
}
# Batch Inference
# inputs = [{
# "prompt": prompt,
# "multi_modal_data": {
# "image": image
# },
# } for _ in 2]
# 2.6
stop_tokens = ['<|im_end|>', '<|endoftext|>']
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
# 2.0
# stop_token_ids = [tokenizer.eos_id]
# 2.5
# stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
sampling_params = SamplingParams(
stop_token_ids=stop_token_ids,
use_beam_search=True,
temperature=0,
best_of=3,
max_tokens=1024
)
outputs = llm.generate(inputs, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
缺陷 | Drawbacks
Exception has occurred: AttributeError Error in model execution (input dumped to /tmp/err_execute_model_input_20250124-153720.pkl): '_OpNamespace' '_vllm_fa2_C' object has no attribute 'varlen_fwd' File "/home/test/test03/zhangzhong/vllm/vllm/worker/model_runner_base.py", line 115, in _wrapper return func(*args, **kwargs) File "/home/test/test03/zhangzhong/vllm/vllm/worker/model_runner.py", line 1716, in execute_model hidden_or_intermediate_states = model_executable( File "/home/test/test03/zhangzhong/vllm/vllm/model_executor/models/minicpmv.py", line 568, in forward output = self.llm.model( File "/home/test/test03/zhangzhong/vllm/vllm/compilation/decorators.py", line 170, in call return self.forward(*args, **kwargs) File "/home/test/test03/zhangzhong/vllm/vllm/model_executor/models/qwen2.py", line 338, in forward hidden_states, residual = layer( File "/home/test/test03/zhangzhong/vllm/vllm/model_executor/models/qwen2.py", line 245, in forward hidden_states = self.self_attn( File "/home/test/test03/zhangzhong/vllm/vllm/model_executor/models/qwen2.py", line 177, in forward attn_output = self.attn(q, k, v, kv_cache, attn_metadata) File "/home/test/test03/zhangzhong/vllm/vllm/attention/layer.py", line 152, in forward torch.ops.vllm.unified_attention_with_output( File "/home/test/test03/zhangzhong/vllm/vllm/attention/layer.py", line 277, in unified_attention_with_output self.impl.forward(query, File "/home/test/test03/zhangzhong/vllm/vllm/attention/backends/flash_attn.py", line 740, in forward flash_attn_varlen_func( File "/home/test/test03/zhangzhong/vllm/vllm/vllm_flash_attn/flash_attn_interface.py", line 154, in flash_attn_varlen_func out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd( AttributeError: '_OpNamespace' '_vllm_fa2_C' object has no attribute 'varlen_fwd'
未解决问题 | Unresolved questions
No response
我也是同样的问题,而且仔细看了一下,安装的vllm的requirement和minicpm-o的requirement是冲突的,这怎么解决?还是要下载不同的版本?
我的也一样,重新创建了conda环境也是一样 报错
上述可能和 cuda 和 torch 的版本有关。 不过现在 MiniCPMO 已经合进官方的仓库中,可以尝试一下用官方的 main 分支直接构建,或者等待 vllm官方发布下一个 wheel。
requirement
如果有冲突直接用 vllm 仓库的 requirements,vllm 那边不使用HF仓库的模型代码,仅仅用 weights 和 processor。
@HwwwwwwwH 我也出现了在v0.7.1上运行vllm 跑不起来的问题,下面是我的环境和运行命令,提示AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder'的问题。
Your current environment
INFO 02-06 15:40:56 init.py:186] Automatically detected platform cuda. Collecting environment information... PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31
Python version: 3.11.11 (main, Dec 11 2024, 16:28:39) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.0-89-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 GPU 1: NVIDIA GeForce RTX 3090 GPU 2: NVIDIA GeForce RTX 3090 GPU 3: NVIDIA GeForce RTX 3090 GPU 4: NVIDIA GeForce RTX 3090 GPU 5: NVIDIA GeForce RTX 3090 GPU 6: NVIDIA GeForce RTX 3090 GPU 7: NVIDIA GeForce RTX 3090
Nvidia driver version: 555.42.02 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.1 /usr/lib/x86_64-linux-gnu/libcudnn.so.9.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.1 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.1 /usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.1 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.1 /usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.1 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.1 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: 架构: x86_64 CPU 运行模式: 32-bit, 64-bit 字节序: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU: 80 在线 CPU 列表: 0-79 每个核的线程数: 2 每个座的核数: 20 座: 2 NUMA 节点: 2 厂商 ID: GenuineIntel CPU 系列: 6 型号: 85 型号名称: Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz 步进: 4 CPU MHz: 1100.000 CPU 最大 MHz: 3000.0000 CPU 最小 MHz: 1000.0000 BogoMIPS: 5000.00 虚拟化: VT-x L1d 缓存: 1.3 MiB L1i 缓存: 1.3 MiB L2 缓存: 40 MiB L3 缓存: 55 MiB NUMA 节点0 CPU: 0-19,40-59 NUMA 节点1 CPU: 20-39,60-79 Vulnerability Gather data sampling: Mitigation; Microcode Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Retbleed: Mitigation; IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable 标记: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d arch_capabilities
Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] nvidia-cublas-cu12==12.4.5.8 [pip3] nvidia-cuda-cupti-cu12==12.4.127 [pip3] nvidia-cuda-nvrtc-cu12==12.4.127 [pip3] nvidia-cuda-runtime-cu12==12.4.127 [pip3] nvidia-cudnn-cu12==9.1.0.70 [pip3] nvidia-cufft-cu12==11.2.1.3 [pip3] nvidia-curand-cu12==10.3.5.147 [pip3] nvidia-cusolver-cu12==11.6.1.9 [pip3] nvidia-cusparse-cu12==12.3.1.170 [pip3] nvidia-ml-py==12.570.86 [pip3] nvidia-nccl-cu12==2.21.5 [pip3] nvidia-nvjitlink-cu12==12.4.127 [pip3] nvidia-nvtx-cu12==12.4.127 [pip3] pyzmq==26.2.1 [pip3] torch==2.5.1 [pip3] torchaudio==2.5.1 [pip3] torchvision==0.20.1 [pip3] transformers==4.48.2 [pip3] triton==3.1.0 [conda] numpy 1.26.4 pypi_0 pypi [conda] nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi [conda] nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi [conda] nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi [conda] nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi [conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi [conda] nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi [conda] nvidia-curand-cu12 10.3.5.147 pypi_0 pypi [conda] nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi [conda] nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi [conda] nvidia-ml-py 12.570.86 pypi_0 pypi [conda] nvidia-nccl-cu12 2.21.5 pypi_0 pypi [conda] nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi [conda] nvidia-nvtx-cu12 12.4.127 pypi_0 pypi [conda] pyzmq 26.2.1 pypi_0 pypi [conda] torch 2.5.1 pypi_0 pypi [conda] torchaudio 2.5.1 pypi_0 pypi [conda] torchvision 0.20.1 pypi_0 pypi [conda] transformers 4.48.2 pypi_0 pypi [conda] triton 3.1.0 pypi_0 pypi ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.7.2.dev62+g56534cd5 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X PIX NODE NODE SYS SYS SYS SYS 0-19,40-59 0 N/A GPU1 PIX X NODE NODE SYS SYS SYS SYS 0-19,40-59 0 N/A GPU2 NODE NODE X PIX SYS SYS SYS SYS 0-19,40-59 0 N/A GPU3 NODE NODE PIX X SYS SYS SYS SYS 0-19,40-59 0 N/A GPU4 SYS SYS SYS SYS X PIX NODE NODE 20-39,60-79 1 N/A GPU5 SYS SYS SYS SYS PIX X NODE NODE 20-39,60-79 1 N/A GPU6 SYS SYS SYS SYS NODE NODE X PIX 20-39,60-79 1 N/A GPU7 SYS SYS SYS SYS NODE NODE PIX X 20-39,60-79 1 N/A
Legend:
X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks
VLLM_USE_MODELSCOPE=True VLLM_ALLOW_RUNTIME_LORA_UPDATING=True LD_LIBRARY_PATH=/usr/local/cuda/lib64: NCCL_CUMEM_ENABLE=0 TORCHINDUCTOR_COMPILE_THREADS=1 CUDA_MODULE_LOADING=LAZY 🐛 Describe the bug
I using the vllm v0.7.1 version and the latest version of vllm (0.7.2.dev62+g56534cd5)
I using the following command to start vllm to deploy minicpm-o-v2.6 and it failed.
CUDA_VISIBLE_DEVICES=0 VLLM_USE_V1=1 python -m vllm.entrypoints.openai.api_server
--model="/home/ubuntu/.cache/modelscope/hub/OpenBMB/MiniCPM-o-2_6"
--served-model-name "test"
--host 0.0.0.0
--trust-remote-code
--max-model-len=4096
--max-num-seqs=5
--port 9001
it failed due to the following reason:
ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/model_executor/models/minicpmo.py", line 236, in get_audio_prompt_texts ERROR 02-06 15:38:08 core.py:210] return self.info.get_hf_processor().get_audio_placeholder( ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder'
full of the output as below:
(vllm_71) root@ubun:~# CUDA_VISIBLE_DEVICES=0 VLLM_USE_V1=1 python -m vllm.entrypoints.openai.api_server \
--model="/home/ubuntu/.cache/modelscope/hub/OpenBMB/MiniCPM-o-2_6" \ --served-model-name "test" \ --host 0.0.0.0 \ --trust-remote-code \ --max-model-len=4096 \ --max-num-seqs=5 \ --port 9001
INFO 02-06 15:37:49 init.py:186] Automatically detected platform cuda.
WARNING 02-06 15:37:51 api_server.py:632] Lora dynamic loading & unloading is enabled in the API server. This should ONLY be used for local development!
INFO 02-06 15:37:51 api_server.py:840] vLLM API server version 0.7.2.dev62+g56534cd5
INFO 02-06 15:37:51 api_server.py:841] args: Namespace(host='0.0.0.0', port=9001, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='/home/ubuntu/.cache/modelscope/hub/OpenBMB/MiniCPM-o-2_6', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=4096, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=5, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['test'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False)
WARNING 02-06 15:37:51 arg_utils.py:1326] Setting max_num_batched_tokens to 2048 for OPENAI_API_SERVER usage context.
INFO 02-06 15:37:57 config.py:542] This model supports multiple tasks: {'classify', 'generate', 'score', 'embed', 'reward'}. Defaulting to 'generate'.
INFO 02-06 15:37:57 config.py:1557] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 02-06 15:37:58 core.py:47] Initializing a V1 LLM engine (v0.7.2.dev62+g56534cd5) with config: model='/home/ubuntu/.cache/modelscope/hub/OpenBMB/MiniCPM-o-2_6', speculative_config=None, tokenizer='/home/ubuntu/.cache/modelscope/hub/OpenBMB/MiniCPM-o-2_6', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=test, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}
WARNING 02-06 15:37:59 registry.py:340] mm_limits has already been set for model=/home/ubuntu/.cache/modelscope/hub/OpenBMB/MiniCPM-o-2_6, and will be overwritten by the new values.
/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/transformers/models/auto/image_processing_auto.py:590: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use slow_image_processor_class, or fast_image_processor_class instead
warnings.warn(
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
INFO 02-06 15:38:01 gpu_model_runner.py:867] Starting to load model /home/ubuntu/.cache/modelscope/hub/OpenBMB/MiniCPM-o-2_6...
INFO 02-06 15:38:01 cuda.py:158] Using Flash Attention backend on V1 engine.
WARNING 02-06 15:38:01 topk_topp_sampler.py:46] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 02-06 15:38:01 cuda.py:158] Using Flash Attention backend on V1 engine.
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:01<00:04, 1.53s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:02<00:02, 1.30s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:04<00:01, 1.44s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:05<00:00, 1.48s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:05<00:00, 1.45s/it]
INFO 02-06 15:38:08 gpu_model_runner.py:872] Loading model weights took 15.7985 GB INFO 02-06 15:38:08 gpu_model_runner.py:951] Encoder cache will be initialized with a budget of 2574 tokens, and profiled with 1 video items of the maximum feature size. ERROR 02-06 15:38:08 core.py:210] EngineCore hit an exception: Traceback (most recent call last): ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 202, in run_engine_core ERROR 02-06 15:38:08 core.py:210] engine_core = EngineCoreProc(*args, **kwargs) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 156, in init ERROR 02-06 15:38:08 core.py:210] super().init(vllm_config, executor_class) ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 54, in init ERROR 02-06 15:38:08 core.py:210] num_gpu_blocks, num_cpu_blocks = self._initialize_kv_caches( ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 79, in _initialize_kv_caches ERROR 02-06 15:38:08 core.py:210] availble_gpu_memory = self.model_executor.determine_available_memory() ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 61, in determine_available_memory ERROR 02-06 15:38:08 core.py:210] output = self.collective_rpc("determine_available_memory") ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 51, in collective_rpc ERROR 02-06 15:38:08 core.py:210] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/utils.py", line 2220, in run_method ERROR 02-06 15:38:08 core.py:210] return func(*args, **kwargs) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 02-06 15:38:08 core.py:210] return func(*args, **kwargs) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 164, in determine_available_memory ERROR 02-06 15:38:08 core.py:210] self.model_runner.profile_run() ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 957, in profile_run ERROR 02-06 15:38:08 core.py:210] dummy_request_data = self.input_registry.dummy_data_for_profiling( ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/inputs/registry.py", line 353, in dummy_data_for_profiling ERROR 02-06 15:38:08 core.py:210] dummy_data = profiler.get_dummy_data(seq_len) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/multimodal/profiling.py", line 164, in get_dummy_data ERROR 02-06 15:38:08 core.py:210] mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/multimodal/profiling.py", line 141, in _get_dummy_mm_inputs ERROR 02-06 15:38:08 core.py:210] return self.processor.apply( ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/model_executor/models/minicpmv.py", line 812, in apply ERROR 02-06 15:38:08 core.py:210] result = super().apply(prompt, mm_data, hf_processor_mm_kwargs) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/multimodal/processing.py", line 1236, in apply ERROR 02-06 15:38:08 core.py:210] hf_mm_placeholders = self._find_mm_placeholders( ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/multimodal/processing.py", line 799, in _find_mm_placeholders ERROR 02-06 15:38:08 core.py:210] return find_mm_placeholders(mm_prompt_repls, new_token_ids, ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/multimodal/processing.py", line 581, in find_mm_placeholders ERROR 02-06 15:38:08 core.py:210] return dict(full_groupby_modality(it)) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/multimodal/processing.py", line 186, in full_groupby_modality ERROR 02-06 15:38:08 core.py:210] return full_groupby(values, key=lambda x: x.modality) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/utils.py", line 885, in full_groupby ERROR 02-06 15:38:08 core.py:210] for value in values: ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/multimodal/processing.py", line 536, in _iter_placeholders ERROR 02-06 15:38:08 core.py:210] replacement = repl_info.get_replacement(item_idx) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/multimodal/processing.py", line 272, in get_replacement ERROR 02-06 15:38:08 core.py:210] replacement = replacement(item_idx) ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/model_executor/models/minicpmo.py", line 359, in get_replacement_minicpmv ERROR 02-06 15:38:08 core.py:210] return self.get_audio_prompt_texts( ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] File "/home/ubuntu/miniconda3/envs/vllm_71/lib/python3.11/site-packages/vllm/model_executor/models/minicpmo.py", line 236, in get_audio_prompt_texts ERROR 02-06 15:38:08 core.py:210] return self.info.get_hf_processor().get_audio_placeholder( ERROR 02-06 15:38:08 core.py:210] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-06 15:38:08 core.py:210] AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder' ERROR 02-06 15:38:08 core.py:210] CRITICAL 02-06 15:38:08 core_client.py:158] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
what should I do to run minicpm o correctly?
@HwwwwwwwH 我也出现了在v0.7.1上运行vllm 跑不起来的问题,下面是我的环境和运行命令,提示AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder'的问题。
更新一下最新的HF仓库代码应该就可以了
@HwwwwwwwH 我也出现了在v0.7.1上运行vllm 跑不起来的问题,下面是我的环境和运行命令,提示AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder'的问题。
更新一下最新的HF仓库代码应该就可以了
RuntimeError: Failed to apply MiniCPMOProcessor on data={'text': '(
RuntimeError: Failed to apply MiniCPMOProcessor on data={'text': '(./)', 'images': [<PIL.Image.Image image mode=RGB size=448x4032 at 0x7F924C6B6350>]} with kwargs={}, 我这边更新后会报这个错误
这个前面应该还有一段报错,需要发出来才好定位,或者就把 Traceback 发全也行
@HwwwwwwwH 我也出现了在v0.7.1上运行vllm 跑不起来的问题,下面是我的环境和运行命令,提示AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder'的问题。 更新一下最新的HF仓库代码应该就可以了 RuntimeError: Failed to apply MiniCPMOProcessor on data={'text': '(./)', 'images': [<PIL.Image.Image image mode=RGB size=448x4032 at 0x7F924C6B6350>]} with kwargs={}, 我这边更新后会报这个错误
同样的错误!是否有解决方案?
[rank0]: Traceback (most recent call last): [rank0]: File "/opt/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/inputs/registry.py", line 160, in call_hf_processor [rank0]: return hf_processor(**data, **merged_kwargs, return_tensors="pt") [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/root/.cache/huggingface/modules/transformers_modules/processing_minicpmo.py", line 77, in call [rank0]: image_inputs = self.image_processor( [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/anaconda3/envs/vllm/lib/python3.12/site-packages/transformers/image_processing_utils.py", line 41, in call [rank0]: return self.preprocess(images, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/root/.cache/huggingface/modules/transformers_modules/image_processing_minicpmv.py", line 382, in preprocess [rank0]: self.normalize(image=image, mean=self.mean, std=self.std, input_data_format=input_data_format) [rank0]: File "/opt/anaconda3/envs/vllm/lib/python3.12/site-packages/transformers/image_processing_utils.py", line 111, in normalize [rank0]: return normalize( [rank0]: ^^^^^^^^^^ [rank0]: File "/opt/anaconda3/envs/vllm/lib/python3.12/site-packages/transformers/image_transforms.py", line 411, in normalize [rank0]: image = (image - mean) / std [rank0]: ~~~~~~^~~~~~ [rank0]: ValueError: operands could not be broadcast together with shapes (1344,154,3) (3,3)
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/opt/ai/yzy/exams/vllm_exam01.py", line 746, in
我这没有复现出来这个问题,看 tranceback的话感觉有可能是 transformers/numpy 版本不同的问题。我这是 transformers==4.48.2, numpy==1.26.4
我这没有复现出来这个问题,看 tranceback的话感觉有可能是 transformers/numpy 版本不同的问题。我这是 transformers==4.48.2, numpy==1.26.4
(vllm) root@autodl-container-d7e74eb6f6-8fc1d0bd:~/autodl-tmp# pip install transformers==4.48.2 Looking in indexes: http://mirrors.aliyun.com/pypi/simple Collecting transformers==4.48.2 Downloading http://mirrors.aliyun.com/pypi/packages/bd/40/902c95a2a6f5d2d120c940ac4bd1f937c01035af529803c13d65ca33c2d1/transformers-4.48.2-py3-none-any.whl (9.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.7/9.7 MB 13.8 MB/s eta 0:00:00 Requirement already satisfied: filelock in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from transformers==4.48.2) (3.17.0) Requirement already satisfied: huggingface-hub<1.0,>=0.24.0 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from transformers==4.48.2) (0.28.1) Requirement already satisfied: numpy>=1.17 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from transformers==4.48.2) (1.26.4) Requirement already satisfied: packaging>=20.0 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from transformers==4.48.2) (24.2) Requirement already satisfied: pyyaml>=5.1 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from transformers==4.48.2) (6.0.2) Requirement already satisfied: regex!=2019.12.17 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from transformers==4.48.2) (2024.11.6) Requirement already satisfied: requests in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from transformers==4.48.2) (2.32.3) Requirement already satisfied: tokenizers<0.22,>=0.21 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from transformers==4.48.2) (0.21.0) Requirement already satisfied: safetensors>=0.4.1 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from transformers==4.48.2) (0.5.2) Requirement already satisfied: tqdm>=4.27 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from transformers==4.48.2) (4.67.1) Requirement already satisfied: fsspec>=2023.5.0 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.24.0->transformers==4.48.2) (2025.2.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.24.0->transformers==4.48.2) (4.12.2) Requirement already satisfied: charset-normalizer<4,>=2 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from requests->transformers==4.48.2) (3.4.1) Requirement already satisfied: idna<4,>=2.5 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from requests->transformers==4.48.2) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from requests->transformers==4.48.2) (2.3.0) Requirement already satisfied: certifi>=2017.4.17 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (from requests->transformers==4.48.2) (2025.1.31) Installing collected packages: transformers Attempting uninstall: transformers Found existing installation: transformers 4.49.0.dev0 Uninstalling transformers-4.49.0.dev0: Successfully uninstalled transformers-4.49.0.dev0 Successfully installed transformers-4.48.2 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. (vllm) root@autodl-container-d7e74eb6f6-8fc1d0bd:~/autodl-tmp# pip install numpy==1.26.4 Looking in indexes: http://mirrors.aliyun.com/pypi/simple Requirement already satisfied: numpy==1.26.4 in /root/miniconda3/envs/vllm/lib/python3.11/site-packages (1.26.4) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. (vllm) root@autodl-container-d7e74eb6f6-8fc1d0bd:~/autodl-tmp# python 1.py INFO 02-07 13:37:42 init.py:183] Automatically detected platform cuda. INFO 02-07 13:37:48 config.py:526] This model supports multiple tasks: {'embed', 'generate', 'classify', 'reward', 'score'}. Defaulting to 'generate'. INFO 02-07 13:37:48 llm_engine.py:232] Initializing a V0 LLM engine (v0.7.1) with config: model='./model', speculative_config=None, tokenizer='./model', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=./model, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=True, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, INFO 02-07 13:37:49 cuda.py:235] Using Flash Attention backend. INFO 02-07 13:37:49 model_runner.py:1111] Starting to load model ./model... INFO 02-07 13:37:49 cuda.py:219] Cannot use FlashAttention-2 backend for head size 72. INFO 02-07 13:37:49 cuda.py:232] Using XFormers backend. Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:01, 1.76it/s] Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:01<00:01, 1.48it/s] Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:01<00:00, 1.50it/s] Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00, 1.63it/s] Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00, 1.59it/s]
INFO 02-07 13:37:53 model_runner.py:1116] Loading model weights took 15.7985 GB
/root/miniconda3/envs/vllm/lib/python3.11/site-packages/transformers/models/auto/image_processing_auto.py:590: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use slow_image_processor_class, or fast_image_processor_class instead
warnings.warn(
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/autodl-tmp/1.py", line 11, in
INFO 02-07 13:41:34 model_runner.py:1116] Loading model weights took 15.7985 GB
/root/autodl-tmp/transformers-main/src/transformers/models/auto/image_processing_auto.py:592: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use slow_image_processor_class, or fast_image_processor_class instead
warnings.warn(
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/inputs/registry.py", line 160, in call_hf_processor
[rank0]: return hf_processor(**data, **merged_kwargs, return_tensors="pt")
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/.cache/huggingface/modules/transformers_modules/model/processing_minicpmo.py", line 77, in call
[rank0]: image_inputs = self.image_processor(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/autodl-tmp/transformers-main/src/transformers/image_processing_utils.py", line 42, in call
[rank0]: return self.preprocess(images, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/.cache/huggingface/modules/transformers_modules/model/image_processing_minicpmv.py", line 381, in preprocess
[rank0]: image_patches = [
[rank0]: ^
[rank0]: File "/root/.cache/huggingface/modules/transformers_modules/model/image_processing_minicpmv.py", line 382, in
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/autodl-tmp/1.py", line 11, in
没有get_audio_placeholder 库的原因是使用的我们MiniCPM-o-2_6的仓库不是最新的,之前的processor代码中这个接口没有暴露出来。你改了transformers版本后报这个错误说明改了版本以后要再拉一下最新的模型仓库代码。
我把transformers修改成4.48.2版本之后,一开始会报错,AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder',搜索之后,从HF下载文件(processing_minicpmo.py和modeling_minicpmo.py)进行更新,成功! 说明modelscope上的模型不是最新的!最新的最好还是在HF下载! 多谢!
minicpm-v-2_6模型一开始也有类似问题,更换transformers版本之后,一开始也是报错:
The model's max seq len (4096) is larger than the maximum number of tokens that can be stored in KV cache (1856). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.
我在代码里面把max_model_len从4096修改为2048之后,也成功了!
我把transformers修改成4.48.2版本之后,一开始会报错,AttributeError: 'MiniCPMOProcessor' object has no attribute 'get_audio_placeholder',搜索之后,从HF下载文件(processing_minicpmo.py和modeling_minicpmo.py)进行更新,成功! 说明modelscope上的模型不是最新的!最新的最好还是在HF下载! 多谢!
感谢反馈,我们尽快同步一下 modelscope