使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
Reminder
- [x] I have read the above rules and searched the existing issues.
System Info
Package Version Editable project location
accelerate 1.6.0 aiofiles 22.1.0 aiohappyeyeballs 2.6.1 aiohttp 3.11.18 aiosignal 1.3.2 aiosqlite 0.21.0 airportsdata 20250224 alabaster 0.7.16 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.9.0 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 astor 0.8.1 asttokens 3.0.0 attrs 25.3.0 audioread 3.0.1 av 14.3.0 babel 2.16.0 beautifulsoup4 4.13.3 blake3 1.0.4 bleach 6.2.0 blinker 1.5 blobfile 3.0.0 byted-mario-collector 2.0.8 byted-remote-ikernel 0.4.8 byted-torch 2.5.1.post1 byted-wandb 0.13.86 bytedance-context 0.7.1 bytedance.hdfs-stdenv 0.0.39 bytedance-metrics 0.5.2 bytedbackgrounds 0.0.6 byteddatabus 1.0.6 byteddps 0.1.2 bytedenv 0.6.4 bytedmemfd 0.2 bytedmetrics 0.10.2 bytedservicediscovery 0.18.0 bytedztijwthelper 0.0.23 bytedztispiffe 0.0.16 cachetools 5.5.2 certifi 2024.8.30 cffi 1.17.1 chardet 5.1.0 charset-normalizer 3.4.0 click 8.1.8 cloudpickle 3.1.1 comm 0.2.2 compressed-tensors 0.9.3 contourpy 1.3.2 cryptography 44.0.2 cupy-cuda12x 13.4.1 cycler 0.12.1 datasets 3.5.0 dbus-python 1.3.2 debugpy 1.8.14 decorator 5.2.1 deepspeed 0.16.7 defusedxml 0.7.1 Deprecated 1.2.18 depyf 0.18.0 devscripts 2.23.4+deb12u1 dill 0.3.8 diskcache 5.6.3 distro 1.8.0 distro-info 1.5+deb12u1 dnspython 2.7.0 docker-pycreds 0.4.0 docstring_parser 0.16 docutils 0.19 einops 0.8.1 email_validator 2.2.0 entrypoints 0.4 enum34 1.1.10 executing 2.2.0 fastapi 0.115.12 fastapi-cli 0.0.7 fastjsonschema 2.21.1 fastrlock 0.8.3 ffmpy 0.5.0 filelock 3.16.1 findspark 2.0.1 fire 0.7.0 fonttools 4.57.0 fqdn 1.5.1 frozenlist 1.6.0 fsspec 2024.10.0 gguf 0.16.3 gitdb 4.0.12 GitPython 3.1.44 googleapis-common-protos 1.70.0 gpg 1.18.0 gradio_client 1.8.0 greenlet 3.1.1 groovy 0.1.2 grpcio 1.71.0 h11 0.16.0 hf-xet 1.0.5 hjson 3.1.0 httpcore 1.0.9 httplib2 0.20.4 httptools 0.6.4 httpx 0.28.1 huggingface-hub 0.30.2 idna 3.10 imagesize 1.4.1 importlib_metadata 8.0.0 interegular 0.3.3 iotop 0.6 ipaddress 1.0.23 ipykernel 6.29.5 ipython 9.0.2 ipython-genutils 0.2.0 ipython_pygments_lexers 1.1.1 ipywidgets 8.1.5 isoduration 20.11.0 jedi 0.19.2 jieba 0.42.1 Jinja2 3.1.6 jiter 0.9.0 joblib 1.4.2 json5 0.10.0 jsonpointer 3.0.0 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 jupyter 1.0.0 jupyter_client 7.4.9 jupyter-console 6.6.3 jupyter_core 5.7.2 jupyter-events 0.12.0 jupyter-kernel-gateway 2.5.2 jupyter_server 2.15.0 jupyter_server_fileid 0.9.3 jupyter_server_terminals 0.5.3 jupyter_server_ydoc 0.8.0 jupyter-ydoc 0.2.5 jupyterlab 3.6.8 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.3 jupyterlab_widgets 3.0.13 kiwisolver 1.4.8 lark 1.2.2 lazr.restfulclient 0.14.5 lazr.uri 1.0.6 lazy_loader 0.4 librosa 0.11.0 llamafactory 0.9.3.dev0 /mlx_devbox/users/zhaomeng.2000/playground/LLama-Factory llguidance 0.7.19 llvmlite 0.44.0 lm-format-enforcer 0.10.11 lxml 5.4.0 markdown-it-py 3.0.0 MarkupSafe 3.0.2 matplotlib 3.10.1 matplotlib-inline 0.1.7 mdurl 0.1.2 merlin_kernel 0.1 mistral_common 1.5.4 mistune 3.1.3 mlx-python-sdk 0.3.0 modelscope 1.25.0 mpmath 1.3.0 msgpack 1.0.8 msgspec 0.19.0 multidict 6.4.3 multiprocess 0.70.16 nbclassic 1.2.0 nbclient 0.10.2 nbconvert 7.16.6 nbformat 5.10.4 nest-asyncio 1.6.0 networkx 3.4.2 ninja 1.11.1.4 nltk 3.9.1 none 0.1.1 notebook 6.5.7 notebook_shim 0.2.4 numba 0.61.2 numpy 1.26.4 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-cusparselt-cu12 0.6.2 nvidia-ml-py 12.570.86 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 oauthlib 3.2.2 omegaconf 2.3.0 openai 1.76.0 opencv-python-headless 4.11.0.86 opentelemetry-api 1.26.0 opentelemetry-exporter-otlp 1.26.0 opentelemetry-exporter-otlp-proto-common 1.26.0 opentelemetry-exporter-otlp-proto-grpc 1.26.0 opentelemetry-exporter-otlp-proto-http 1.26.0 opentelemetry-proto 1.26.0 opentelemetry-sdk 1.26.0 opentelemetry-semantic-conventions 0.47b0 opentelemetry-semantic-conventions-ai 0.4.3 orjson 3.10.16 outlines 0.1.11 outlines_core 0.1.26 overrides 7.7.0 packaging 24.1 pandas 2.2.3 pandocfilters 1.5.1 parso 0.8.4 partial-json-parser 0.2.1.1.post5 pathtools 0.1.2 peft 0.15.1 pexpect 4.8.0 pillow 11.2.1 pip 25.0.1 platformdirs 4.3.7 pooch 1.8.2 prometheus_client 0.21.1 prometheus-fastapi-instrumentator 7.1.0 promise 2.3 prompt_toolkit 3.0.50 propcache 0.3.1 protobuf 3.20.3 psutil 7.0.0 ptyprocess 0.7.0 pure_eval 0.2.3 py-cpuinfo 9.0.0 py4j 0.10.9.9 pyairports 2.1.1 pyarrow 19.0.1 pycountry 24.6.1 pycparser 2.22 pycryptodomex 3.22.0 pydantic 2.10.6 pydantic_core 2.27.2 pydub 0.25.1 Pygments 2.18.0 PyGObject 3.42.2 PyJWT 2.6.0 pyOpenSSL 25.0.0 pyparsing 3.0.9 python-apt 2.6.0 python-dateutil 2.9.0.post0 python-debian 0.1.49 python-dotenv 1.0.1 python-json-logger 3.3.0 python-magic 0.4.26 python-multipart 0.0.20 pytz 2025.2 pyxdg 0.28 PyYAML 6.0.2 pyzmq 26.3.0 qtconsole 5.6.1 QtPy 2.4.3 ray 2.43.0 reactivex 4.0.4 referencing 0.36.2 regex 2024.11.6 requests 2.32.3 rfc3339-validator 0.1.4 rfc3986 2.0.0 rfc3986-validator 0.1.1 rich 14.0.0 rich-toolkit 0.14.3 rouge-chinese 1.0.3 rpds-py 0.23.1 ruff 0.11.7 safehttpx 0.1.6 safetensors 0.5.3 schedule 1.2.2 scikit-learn 1.6.1 scipy 1.15.2 semantic-version 2.10.0 Send2Trash 1.8.3 sentencepiece 0.2.0 sentry-sdk 2.24.0 setproctitle 1.3.5 setuptools 65.7.0 shellingham 1.5.4 shortuuid 1.0.13 shtab 1.7.2 six 1.16.0 smmap 5.0.2 sniffio 1.3.1 snowballstemmer 2.2.0 soundfile 0.13.1 soupsieve 2.6 soxr 0.5.0.post1 Sphinx 5.3.0 sphinxcontrib-applehelp 2.0.0 sphinxcontrib-devhelp 2.0.0 sphinxcontrib-htmlhelp 2.1.0 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 2.0.0 sphinxcontrib-serializinghtml 2.0.0 sphinxcontrib-websupport 2.0.0 SQLAlchemy 2.0.27 sse-starlette 2.3.3 stack-data 0.6.3 starlette 0.46.2 sympy 1.13.1 termcolor 3.0.1 terminado 0.18.1 threadpoolctl 3.6.0 tiktoken 0.9.0 tinycss2 1.4.0 tokenizers 0.21.1 tomlkit 0.13.2 torch 2.6.0 torchaudio 2.6.0 torchvision 0.21.0 tornado 6.4.2 tqdm 4.67.1 traitlets 5.14.3 transformers 4.52.0.dev0 triton 3.2.0 trl 0.9.6 typer 0.15.2 types-python-dateutil 2.9.0.20241206 typing_extensions 4.12.2 tyro 0.8.14 tzdata 2025.2 ujson 5.10.0 unattended-upgrades 0.1 unidiff 0.7.3 unzip 1.0.0 uri-template 1.3.0 urllib3 1.26.20 uvicorn 0.34.2 uvloop 0.21.0 vllm 0.8.5 wadllib 1.3.6 watchfiles 1.0.5 wcwidth 0.2.13 webcolors 24.11.1 webencodings 0.5.1 websocket-client 1.8.0 websockets 15.0.1 wheel 0.44.0 widgetsnbextension 4.0.13 wrapt 1.17.2 xdg 5 xformers 0.0.29.post2 xgrammar 0.1.18 xxhash 3.5.0 y-py 0.6.2 yarl 1.20.0 ypy-websocket 0.8.4 zipp 3.21.0
Reproduction
Traceback (most recent call last):
File "/mlx_devbox/users/zhaomeng.2000/playground/LLama-Factory/scripts/vllm_image.py", line 169, in limit_mm_per_prompt is only supported for "
ValueError: limit_mm_per_prompt is only supported for multimodal models.
Others
这是vllm推理文件
Copyright 2025 the LlamaFactory team.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
import json from typing import Optional
import fire from transformers import Seq2SeqTrainingArguments
from llamafactory.data import get_dataset, get_template_and_fix_tokenizer from llamafactory.extras.constants import IGNORE_INDEX from llamafactory.extras.misc import get_device_count from llamafactory.extras.packages import is_vllm_available from llamafactory.hparams import get_infer_args from llamafactory.model import load_tokenizer
if is_vllm_available(): from vllm import LLM, SamplingParams from vllm.lora.request import LoRARequest
def vllm_infer( model_name_or_path: str, adapter_name_or_path: str = None, dataset: str = "Qwen-921", dataset_dir: str = "/mlx_devbox/users/zhaomeng.2000/playground/LLama-Factory/data", template: str = "intern_vl", cutoff_len: int = 2048, max_samples: Optional[int] = None, vllm_config: str = "{}", save_name: str = "/mlx_devbox/users/zhaomeng.2000/playground/InternVL3-image/result/result-InternVL3-8B-NoPe-921.jsonl", temperature: float = 0.6, top_p: float = 0.95, top_k: int = 50, max_new_tokens: int = 1024, repetition_penalty: float = 1.0, skip_special_tokens: bool = True, seed: Optional[int] = None, pipeline_parallel_size: int = 1, image_max_pixels: int = 768 * 768, image_min_pixels: int = 32 * 32, video_fps: float = 2.0, video_maxlen: int = 128, ): r"""Perform batch generation using vLLM engine, which supports tensor parallelism.
Usage: python vllm_infer.py --model_name_or_path meta-llama/Llama-2-7b-hf --template llama --dataset alpaca_en_demo
"""
if pipeline_parallel_size > get_device_count():
raise ValueError("Pipeline parallel size should be smaller than the number of gpus.")
model_args, data_args, _, generating_args = get_infer_args(
dict(
model_name_or_path=model_name_or_path,
adapter_name_or_path=adapter_name_or_path,
dataset=dataset,
dataset_dir=dataset_dir,
template=template,
cutoff_len=cutoff_len,
max_samples=max_samples,
preprocessing_num_workers=16,
vllm_config=vllm_config,
temperature=temperature,
top_p=top_p,
top_k=top_k,
max_new_tokens=max_new_tokens,
repetition_penalty=repetition_penalty,
)
)
training_args = Seq2SeqTrainingArguments(output_dir="dummy_dir")
tokenizer_module = load_tokenizer(model_args)
tokenizer = tokenizer_module["tokenizer"]
template_obj = get_template_and_fix_tokenizer(tokenizer, data_args)
template_obj.mm_plugin.expand_mm_tokens = False # for vllm generate
dataset_module = get_dataset(template_obj, model_args, data_args, training_args, "ppo", **tokenizer_module)
inputs, prompts, labels = [], [], []
for sample in dataset_module["train_dataset"]:
if sample["images"]:
multi_modal_data = {
"image": template_obj.mm_plugin._regularize_images(
sample["images"], image_max_pixels=image_max_pixels, image_min_pixels=image_min_pixels
)["images"]
}
elif sample["videos"]:
multi_modal_data = {
"video": template_obj.mm_plugin._regularize_videos(
sample["videos"],
image_max_pixels=image_max_pixels,
image_min_pixels=image_min_pixels,
video_fps=video_fps,
video_maxlen=video_maxlen,
)["videos"]
}
elif sample["audios"]:
audio_data = template_obj.mm_plugin._regularize_audios(
sample["audios"],
sampling_rate=16000,
)
multi_modal_data = {"audio": zip(audio_data["audios"], audio_data["sampling_rates"])}
else:
multi_modal_data = None
inputs.append({"prompt_token_ids": sample["input_ids"], "multi_modal_data": multi_modal_data})
prompts.append(tokenizer.decode(sample["input_ids"], skip_special_tokens=skip_special_tokens))
labels.append(
tokenizer.decode(
list(filter(lambda x: x != IGNORE_INDEX, sample["labels"])), skip_special_tokens=skip_special_tokens
)
)
sampling_params = SamplingParams(
repetition_penalty=generating_args.repetition_penalty or 1.0, # repetition_penalty must > 0
temperature=generating_args.temperature,
top_p=generating_args.top_p or 1.0, # top_p must > 0
top_k=generating_args.top_k or -1, # top_k must > 0
stop_token_ids=template_obj.get_stop_token_ids(tokenizer),
max_tokens=generating_args.max_new_tokens,
skip_special_tokens=skip_special_tokens,
seed=seed,
)
if model_args.adapter_name_or_path is not None:
lora_request = LoRARequest("default", 1, model_args.adapter_name_or_path[0])
else:
lora_request = None
engine_args = {
"model": model_args.model_name_or_path,
"trust_remote_code": True,
"dtype": model_args.infer_dtype,
"max_model_len": cutoff_len + max_new_tokens,
"tensor_parallel_size": (get_device_count() // pipeline_parallel_size) or 1,
"pipeline_parallel_size": pipeline_parallel_size,
"disable_log_stats": True,
"enable_lora": model_args.adapter_name_or_path is not None,
# "dtype": "float16"
}
if template_obj.mm_plugin.__class__.__name__ != "BasePlugin":
engine_args["limit_mm_per_prompt"] = {"image": 4, "video": 2, "audio": 2}
if isinstance(model_args.vllm_config, dict):
engine_args.update(model_args.vllm_config)
results = LLM(**engine_args).generate(inputs, sampling_params, lora_request=lora_request)
preds = [result.outputs[0].text for result in results]
with open(save_name, "w", encoding="utf-8") as f:
for text, pred, label in zip(prompts, preds, labels):
f.write(json.dumps({"prompt": text, "predict": pred, "label": label}, ensure_ascii=False) + "\n")
print("*" * 70)
print(f"{len(prompts)} generated results have been saved at {save_name}.")
print("*" * 70)
if name == "main": fire.Fire(vllm_infer)
执行命令:DISABLE_VERSION_CHECK=1 python scripts/vllm_image.py --model_name_or_path /mlx_devbox/users/zhaomeng.2000/playground/InternVL3-8B-hf --adapter_name_or_path /mlx_devbox/users/zhaomeng.2000/playground/InternVL3-image/model_8B
因为vllm的官方代码里支持的是internvl-chat版本,所以识别不到,我有空加一个internvl-hf -> internvl-chat的转换, 这里有和你一样的问题https://github.com/hiyouga/LLaMA-Factory/pull/7258#issuecomment-2858717733
我用 llamafactory sft 的 internvl3-hf 模型,vllm server 启动时报的另一个错:AttributeError: 'InternVLConfig' object has no attribute 'vocab_size'
部分上下文如下:
...
File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_worker.py", line 162, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1332, in load_model
self.model = get_model(vllm_config=self.vllm_config)
File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
return loader.load_model(vllm_config=vllm_config)
File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 452, in load_model
model = _initialize_model(vllm_config=vllm_config)
File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
return model_class(vllm_config=vllm_config, prefix=prefix)
File "/usr/local/lib/python3.9/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 377, in __init__
self.model = TransformersModel(vllm_config=vllm_config, prefix=prefix)
File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 157, in __init__
config.vocab_size,
File "/usr/local/lib/python3.9/dist-packages/transformers/configuration_utils.py", line 211, in __getattribute__
return super().__getattribute__(key)
AttributeError: 'InternVLConfig' object has no attribute 'vocab_size'
...
Now we offer a simple script for guys who want to use VLLM serve InternVL-series after training -HF version model.
[!Warning]
- The following pipeline is only verified with
InternVL3-2B-hfwithlm_type: qwen2. More checks are needed.- Replace the tokenizer, which is needed if you add special tokens in training.
Quick usage:
python ${this_script} --input_dir ./saves/internvl3-2b/sft/llava_1k/checkpoint-500 --output_dir internvl3-2b-chat
mv internvl3-2b-chat/model.safetensors InternVL3-2B/ # replace
vllm serve InternVL3-2B
- Convert your saved checkpoint to the InternVL-Chat version with the following script.
- Replace the original
model.safetenosrswith converted ones in the original dir. - Use VLLM serve "{your replaced dir}"
import argparse
import gc
import os
import re
import torch
from einops import rearrange
from transformers import InternVLForConditionalGeneration, AutoConfig
from safetensors import safe_open
from safetensors.torch import save_file
LM_TYPE_CORRESPONDENCE = {
"OpenGVLab/InternVL2_5-1B-MPO": "qwen2",
"OpenGVLab/InternVL2_5-2B-MPO": "llama",
"OpenGVLab/InternVL2_5-4B-MPO": "qwen2",
"OpenGVLab/InternVL2_5-8B-MPO": "llama",
"OpenGVLab/InternVL2_5-26B-MPO": "llama",
"OpenGVLab/InternVL2_5-38B-MPO": "qwen2",
"OpenGVLab/InternVL2_5-78B-MPO": "qwen2",
"OpenGVLab/InternVL3-1B": "qwen2",
"OpenGVLab/InternVL3-2B": "qwen2",
"OpenGVLab/InternVL3-8B": "qwen2",
"OpenGVLab/InternVL3-9B": "llama",
"OpenGVLab/InternVL3-14B": "qwen2",
"OpenGVLab/InternVL3-38B": "qwen2",
"OpenGVLab/InternVL3-78B": "qwen2",
}
# Reverse mapping dictionaries
CONVERTED_TO_ORIGINAL_KEY_MAPPING_VISION = {
r"model\.vision_tower": r"vision_model",
r"layer": r"layers",
r"cls_token": r"class_embedding",
r"position_embeddings": r"position_embedding",
r"patch_embeddings.projection": r"patch_embedding",
r"lambda_(\d+)": r"ls\1",
r"attention.projection_layers": r"attn.proj",
r"attention.projection_dropout": r"attn.dropout",
r"attention": r"attn",
r"layersnorm_before": r"norm1",
r"layersnorm_after": r"norm2",
}
CONVERTED_TO_ORIGINAL_KEY_MAPPING_TEXT_LLAMA = {
r"embed_tokens": r"tok_embeddings",
r"self_attn.o_proj": r"attention.wo",
r"mlp.gate_proj": r"feed_forward.w1",
r"mlp.down_proj": r"feed_forward.w2",
r"mlp.up_proj": r"feed_forward.w3",
r"input_layernorm": r"attention_norm",
r"post_attention_layernorm": r"ffn_norm",
r"lm_head": r"output",
}
CONVERTED_TO_ORIGINAL_KEY_MAPPING_MULTI = {
r"model.multi_modal_projector.layer_norm": r"mlp1.0",
r"model.multi_modal_projector.linear_1": r"mlp1.1",
r"model.multi_modal_projector.linear_2": r"mlp1.3",
}
def convert_new_keys_to_old_keys(state_dict_keys, lm_type):
"""Convert HF format keys back to original format"""
output_dict = {}
# Vision model keys
vision_keys = [key for key in state_dict_keys if key.startswith("model.vision_tower")]
vision_keys_text = "\n".join(vision_keys)
# now we should replace the vision_atten qkv
new_vision_text = vision_keys_text
for pattern, replacement in CONVERTED_TO_ORIGINAL_KEY_MAPPING_VISION.items():
new_vision_text = re.sub(pattern, replacement, new_vision_text)
output_dict.update(dict(zip(vision_keys, new_vision_text.split("\n"))))
# Language model keys
language_keys = [key for key in state_dict_keys if key.startswith("model.language_model") or key.startswith("lm_head")]
language_keys_text = "\n".join(language_keys)
language_keys_text = language_keys_text.replace("model.language_model", "language_model.model") # reverse order of keys
new_language_text = language_keys_text
if lm_type == "llama":
for pattern, replacement in CONVERTED_TO_ORIGINAL_KEY_MAPPING_TEXT_LLAMA.items():
new_language_text = re.sub(pattern, replacement, new_language_text)
output_dict.update(dict(zip(language_keys, new_language_text.split("\n"))))
# Multi-modal keys
multi_keys = [key for key in state_dict_keys if key.startswith("model.multi_modal_projector")]
multi_keys_text = "\n".join(multi_keys)
new_multi_text = multi_keys_text
for pattern, replacement in CONVERTED_TO_ORIGINAL_KEY_MAPPING_MULTI.items():
new_multi_text = re.sub(pattern, replacement, new_multi_text)
output_dict.update(dict(zip(multi_keys, new_multi_text.split("\n"))))
return output_dict
def recombine_attention_weights(hf_state_dict, lm_type, config):
"""
Recombine the separated attention weights back into original format
Mainly for visual parts of the model
"""
new_state_dict = {}
# Process vision model attention weights
vision_keys = [k for k in list(hf_state_dict.keys()) if k.startswith("model.vision_tower")]
for key in vision_keys:
if "attention.q_proj" in key and "bias" not in key:
# model.vision_tower
base_key = key.replace("attention.q_proj", "attn.qkv")
q_weights = hf_state_dict[key]
k_weights = hf_state_dict[key.replace("q_proj", "k_proj")]
v_weights = hf_state_dict[key.replace("q_proj", "v_proj")]
# Concatenate q, k, v weights
qkv_weights = torch.cat([q_weights, k_weights, v_weights], dim=0)
new_state_dict[base_key.replace("model.vision_tower", "vision_model")] = qkv_weights
elif "attention.q_proj" in key and "bias" in key:
base_key = key.replace("attention.q_proj", "attn.qkv") # attn.qkv.bias
q_bias = hf_state_dict[key]
k_bias = hf_state_dict[key.replace("q_proj", "k_proj")]
v_bias = hf_state_dict[key.replace("q_proj", "v_proj")]
qkv_bias = torch.cat([q_bias, k_bias, v_bias], dim=0)
new_state_dict[base_key.replace("model.vision_tower", "vision_model")] = qkv_bias
# del new_state_dict[key]
# del new_state_dict[key.replace("q_proj", "k_proj")]
# del new_state_dict[key.replace("q_proj", "v_proj")]
elif "attention.k_proj" in key or "attention.v_proj" in key:
continue
else:
# Copy other weights directly
new_state_dict[key] = hf_state_dict[key]
# Process language model attention weights - specific to model type
if lm_type == "llama":
for key in hf_state_dict.keys():
if "self_attn.q_proj" in key:
# For Llama models, reconstruct combined wqkv
base_key = key.replace("self_attn.q_proj", "attention.wqkv")
q_weights = hf_state_dict[key]
k_weights = hf_state_dict[key.replace("q_proj", "k_proj")]
v_weights = hf_state_dict[key.replace("q_proj", "v_proj")]
# Reconstruct wqkv based on model configuration
num_heads = config.text_config.num_attention_heads
num_kv_heads = config.text_config.num_key_value_heads
head_dim = config.text_config.hidden_size // num_heads
num_key_value_groups = num_heads // num_kv_heads
# Reshape to get individual head weights
q_heads = q_weights.view(num_heads, head_dim, -1)
k_heads = k_weights.view(num_kv_heads, head_dim, -1)
v_heads = v_weights.view(num_kv_heads, head_dim, -1)
# Recombine in the original wqkv format
# This is a complex process that depends on specific implementation details
if num_key_value_groups > 1:
# Handle grouped query attention case
wqkv = torch.cat([
q_heads.reshape(-1, q_heads.size(-1)),
k_heads.reshape(-1, k_heads.size(-1)),
v_heads.reshape(-1, v_heads.size(-1))
], dim=0)
else:
# Handle regular attention case
shapes = (num_heads, 2 + num_key_value_groups, head_dim, q_heads.size(-1))
wqkv_tensors = torch.zeros(shapes, device=q_heads.device, dtype=q_heads.dtype)
wqkv_tensors[:, :num_key_value_groups, ...] = q_heads.unsqueeze(1)
wqkv_tensors[:, -2, ...] = k_heads
wqkv_tensors[:, -1, ...] = v_heads
wqkv = wqkv_tensors.reshape(-1, q_heads.size(-1))
new_state_dict[base_key] = wqkv
elif "self_attn.k_proj" in key or "self_attn.v_proj" in key:
# Skip as handled in q_proj processing
continue
else:
new_key = key
# Add other conversions as needed
new_state_dict[new_key] = hf_state_dict[key]
else:
# For other model types (e.g., qwen2), copy all non-vision keys directly
# which is compatible with the original format
for key in hf_state_dict.keys():
if not key.startswith("vision_tower"):
new_state_dict[key] = hf_state_dict[key]
elif key.startswith("model.language_model"):
new_state_dict[key.replace("model.language_model", "language_model.model")] = hf_state_dict[key]
return new_state_dict
def reverse_convert_model(input_path, output_path):
"""Convert a HuggingFace format InternVL model back to the original format using safetensors"""
print(f"Loading HF model from {input_path}...")
# Determine model type from path or config
model_name = os.path.basename(input_path).replace("-hf", "")
lm_type = None
for original_name, _type in LM_TYPE_CORRESPONDENCE.items():
if model_name in original_name:
lm_type = _type
break
if lm_type is None:
# Default to qwen2 if unknown
print("Couldn't determine language model type, defaulting to qwen2")
lm_type = "qwen2"
print(f"Detected language model type: {lm_type}")
# Load model
config = AutoConfig.from_pretrained(input_path)
print("Loading model weights...")
hf_model = InternVLForConditionalGeneration.from_pretrained(
input_path,
torch_dtype=torch.bfloat16, # Use float16 to reduce memory usage
low_cpu_mem_usage=True,
device_map='auto' # Use device_map to load large models
)
# Extract state dict
print("Extracting state dictionary...")
hf_state_dict = hf_model.state_dict()
# Check if state_dict is empty or very small
num_params = sum(p.numel() for p in hf_model.parameters())
print(f"Model has {num_params} parameters")
print(f"State dict has {len(hf_state_dict)} keys")
# 1. Rename keys to original format
print("Converting keys to original format...")
all_keys = list(hf_state_dict.keys())
key_mapping = convert_new_keys_to_old_keys(all_keys, lm_type)
# 2. Recombine attention weights
print("Recombining attention weights...")
original_state_dict = recombine_attention_weights(hf_state_dict, lm_type, config)
# 3. Apply key mapping
print("Applying key mapping to restore original format...")
final_state_dict = {}
for old_key, tensor in original_state_dict.items():
new_key = key_mapping.get(old_key, old_key)
if "qkv" in old_key: # hack for new key
final_state_dict[old_key.replace("layer", "layers")] = tensor.detach().clone()
elif "lm_head.weight" in old_key: # hardcode
final_state_dict["language_model.lm_head.weight"] = tensor.detach().clone()
else:
final_state_dict[new_key] = tensor.detach().clone() # Make sure we have a copy of the tensor
# 4. Save model in original format using safetensors
os.makedirs(output_path, exist_ok=True)
safetensors_path = os.path.join(output_path, "model.safetensors")
print(f"Saving model in safetensors format to {safetensors_path}")
# Convert to CPU before saving if on GPU
for key in list(final_state_dict.keys()):
if final_state_dict[key].device.type != 'cpu':
final_state_dict[key] = final_state_dict[key].cpu()
# Check tensor sizes before saving
total_size_gb = sum(tensor.numel() * tensor.element_size() for tensor in final_state_dict.values()) / 1024**3
print(f"Total size of state dict to save: {total_size_gb:.2f} GB")
keys_to_remove = [k for k in final_state_dict.keys() if '_proj' in k and "vision" in k]
if keys_to_remove:
print(f"Removing {len(keys_to_remove)} keys containing '_proj'...")
for key in keys_to_remove:
print(f" Removing: {key}")
del final_state_dict[key]
print(f"After removal, state dict contains {len(final_state_dict)} keys")
# Save each key-value pair in the final state dict
try:
print("Saving tensors...")
save_file(final_state_dict, safetensors_path)
print(f"Successfully saved model to {safetensors_path}")
# Verify the saved file
file_size_gb = os.path.getsize(safetensors_path) / 1024**3
print(f"Saved file size: {file_size_gb:.2f} GB")
# Optionally verify we can read the saved file
print("Verifying saved file...")
with safe_open(safetensors_path, framework="pt", device="cpu") as f:
keys = f.keys()
print(f"SafeTensors file contains {len(keys)} keys")
except Exception as e:
print(f"Error saving model: {e}")
# Fallback to PyTorch format if safetensors fails
print("Falling back to PyTorch binary format...")
torch.save(final_state_dict, os.path.join(output_path, "pytorch_model.bin"))
# Clean up to free memory
del hf_model, hf_state_dict, original_state_dict, final_state_dict
gc.collect()
torch.cuda.empty_cache() if torch.cuda.is_available() else None
print("Model conversion complete")
def check_model_conversion(original_model_path, converted_model_path):
"""
Compare the original model state dict with the converted model state dict
to ensure the conversion was successful.
"""
print(f"Checking model conversion between {original_model_path} and {converted_model_path}...")
# Load original model state dict using AutoModel
print("Loading original model state dict...")
try:
from transformers import AutoModel
# Load original model and get state dict
original_model = AutoModel.from_pretrained(
original_model_path,
torch_dtype=torch.bfloat16, # Use lower precision to reduce memory usage
low_cpu_mem_usage=True,
use_flash_attn=False,
trust_remote_code=True,
).eval()
original_state_dict = original_model.state_dict()
# Free up memory
del original_model
gc.collect()
torch.cuda.empty_cache() if torch.cuda.is_available() else None
except Exception as e:
print(f"Error loading original model: {e}")
print("Trying to load state dict directly...")
# Fallback to loading state dict files directly
try:
# Try safetensors first
original_safetensors_path = os.path.join(original_model_path, "model.safetensors")
if os.path.exists(original_safetensors_path):
with safe_open(original_safetensors_path, framework="pt", device="cpu") as f:
original_state_dict = {k: f.get_tensor(k) for k in f.keys()}
else:
# Fall back to PyTorch format
original_bin_path = os.path.join(original_model_path, "pytorch_model.bin")
if os.path.exists(original_bin_path):
original_state_dict = torch.load(original_bin_path, map_location="cpu")
else:
raise FileNotFoundError(f"Could not find model files in {original_model_path}")
except Exception as e:
print(f"Error loading original model state dict: {e}")
return
# Load converted model state dict
print("Loading converted model state dict...")
try:
converted_safetensors_path = os.path.join(converted_model_path, "model.safetensors")
if os.path.exists(converted_safetensors_path):
with safe_open(converted_safetensors_path, framework="pt", device="cpu") as f:
converted_state_dict = {k: f.get_tensor(k) for k in f.keys()}
else:
# Fall back to PyTorch format
converted_bin_path = os.path.join(converted_model_path, "pytorch_model.bin")
if os.path.exists(converted_bin_path):
converted_state_dict = torch.load(converted_bin_path, map_location="cpu")
else:
raise FileNotFoundError(f"Could not find model files in {converted_model_path}")
except Exception as e:
print(f"Error loading converted model: {e}")
return
# Compare state dicts
original_keys = set(original_state_dict.keys())
converted_keys = set(converted_state_dict.keys())
# Check for missing keys
missing_in_converted = original_keys - converted_keys
missing_in_original = converted_keys - original_keys
common_keys = original_keys.intersection(converted_keys)
print(f"Total keys in original model: {len(original_keys)}")
print(f"Total keys in converted model: {len(converted_keys)}")
print(f"Keys missing in converted model: {len(missing_in_converted)}")
print(f"Extra keys in converted model: {len(missing_in_original)}")
print(f"Common keys: {len(common_keys)}")
if missing_in_converted:
print("\nSample of missing keys in converted model:")
for key in list(missing_in_converted)[:10]:
print(f" {key}")
if missing_in_original:
print("\nSample of extra keys in converted model:")
for key in list(missing_in_original)[:200]:
print(f" {key}")
# Check tensor shapes and values for common keys
shape_mismatches = []
value_mismatches = []
max_diff = 0.0
for key in common_keys:
orig_tensor = original_state_dict[key]
conv_tensor = converted_state_dict[key]
# Check shapes
if orig_tensor.shape != conv_tensor.shape:
shape_mismatches.append((key, orig_tensor.shape, conv_tensor.shape))
continue
# Check values (sample a few elements to avoid excessive memory usage)
try:
if orig_tensor.numel() > 1000:
# Sample elements for large tensors
indices = torch.randint(0, orig_tensor.numel(), (1000,))
orig_sample = orig_tensor.view(-1)[indices]
conv_sample = conv_tensor.view(-1)[indices]
diff = torch.abs(orig_sample - conv_sample).max().item()
else:
diff = torch.abs(orig_tensor - conv_tensor).max().item()
max_diff = max(max_diff, diff)
# Consider a significant difference as a mismatch (adjust threshold as needed)
if diff > 1e-3:
value_mismatches.append((key, diff))
except Exception as e:
print(f"Error comparing values for key {key}: {e}")
print(f"\nShape mismatches: {len(shape_mismatches)}")
if shape_mismatches:
print("Sample of shape mismatches:")
for key, orig_shape, conv_shape in shape_mismatches[:10]:
print(f" {key}: original {orig_shape} vs converted {conv_shape}")
print(f"\nValue mismatches: {len(value_mismatches)}")
if value_mismatches:
print("Sample of value mismatches:")
for key, diff in sorted(value_mismatches[:10], key=lambda x: x[1], reverse=True):
print(f" {key}: max difference = {diff}")
print(f"\nOverall maximum difference in tensor values: {max_diff}")
if len(shape_mismatches) == 0 and len(value_mismatches) == 0 and len(missing_in_converted) == 0:
print("\nCONVERSION CHECK PASSED: Model conversion appears to be successful!")
else:
print("\nCONVERSION CHECK FAILED: There are differences between the original and converted models.")
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--input_dir", default="OpenGVLab/InternVL3-2B-hf",
help="Location of HF format InternVL model")
parser.add_argument("--output_dir", default="InternVL3-2B-original",
help="Location to write original format model")
parser.add_argument("--lm_type", default=None, choices=["llama", "qwen2"],
help="Language model type (llama or qwen2), will be auto-detected if not specified")
args = parser.parse_args()
# If lm_type was manually specified, override auto-detection
if args.lm_type is not None:
print(f"Using manually specified language model type: {args.lm_type}")
lm_type = args.lm_type
else:
lm_type = None
reverse_convert_model(args.input_dir, args.output_dir)
# unitest
# check_model_conversion("OpenGVLab/InternVL3-2B", args.output_dir)
if __name__ == "__main__":
main()
@piamo @zhaomeng1234456 @FloSophorae
我用 llamafactory sft 的 internvl3-hf 模型,vllm server 启动时报的另一个错:AttributeError: 'InternVLConfig' object has no attribute 'vocab_size'
部分上下文如下:
... File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_worker.py", line 162, in load_model self.model_runner.load_model() File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1332, in load_model self.model = get_model(vllm_config=self.vllm_config) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model return loader.load_model(vllm_config=vllm_config) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 452, in load_model model = _initialize_model(vllm_config=vllm_config) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model return model_class(vllm_config=vllm_config, prefix=prefix) File "/usr/local/lib/python3.9/dist-packages/vllm/compilation/decorators.py", line 151, in __init__ old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 377, in __init__ self.model = TransformersModel(vllm_config=vllm_config, prefix=prefix) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 157, in __init__ config.vocab_size, File "/usr/local/lib/python3.9/dist-packages/transformers/configuration_utils.py", line 211, in __getattribute__ return super().__getattribute__(key) AttributeError: 'InternVLConfig' object has no attribute 'vocab_size' ...
@Kuangdd01 我已经转换成chat模型,还是有这样的问题,使用vllm serve
我用 llamafactory sft 的 internvl3-hf 模型,vllm server 启动时报的另一个错:AttributeError: 'InternVLConfig' object has no attribute 'vocab_size' 部分上下文如下:
... File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_worker.py", line 162, in load_model self.model_runner.load_model() File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1332, in load_model self.model = get_model(vllm_config=self.vllm_config) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model return loader.load_model(vllm_config=vllm_config) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 452, in load_model model = _initialize_model(vllm_config=vllm_config) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model return model_class(vllm_config=vllm_config, prefix=prefix) File "/usr/local/lib/python3.9/dist-packages/vllm/compilation/decorators.py", line 151, in __init__ old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 377, in __init__ self.model = TransformersModel(vllm_config=vllm_config, prefix=prefix) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 157, in __init__ config.vocab_size, File "/usr/local/lib/python3.9/dist-packages/transformers/configuration_utils.py", line 211, in __getattribute__ return super().__getattribute__(key) AttributeError: 'InternVLConfig' object has no attribute 'vocab_size' ...@Kuangdd01 我已经转换成chat模型,还是有这样的问题,使用vllm serve
用的原来chat模型的config.json
@Kuangdd01
Hi, first of all, thank you so much for providing the fine-tuning code for InternVL3. I really appreciate your work and contribution to the open-source community.
I have fine-tuned the InternVL3-8B model using LoRA via LLaMA-Factory. Now, I would like to use the resulting LoRA-adapted weights for inference with vLLM. However, I’m not sure how to properly load or integrate these weights into a vLLM-based inference pipeline.
Could you kindly advise me on how to use a LoRA-tuned InternVL3-8B (trained via LLaMA-Factory) with vLLM?
Thank you in advance for your support!
Hi, first of all, thank you so much for providing the fine-tuning code for InternVL3. I really appreciate your work and contribution to the open-source community.
I have fine-tuned the InternVL3-8B model using LoRA via LLaMA-Factory. Now, I would like to use the resulting LoRA-adapted weights for inference with vLLM. However, I’m not sure how to properly load or integrate these weights into a vLLM-based inference pipeline.
Could you kindly advise me on how to use a LoRA-tuned InternVL3-8B (trained via LLaMA-Factory) with vLLM?
Thank you in advance for your support!
First, export your fine-tuned model after merging LoRA adapter.
Second, try the above scripts to convert the -hf model to the -chat model, which vllm supports.
Then, replace the safetensors with the converted ones in OpenGVLab/InternVL3-8B, we only want to re-use its config.
Finally, vllm serve <your_path>.
If you add more (special) tokens during training, please carefully replace the following configs with the configs in your adapter direction.
First of all, thank you very much for the fast and helpful response.
I followed your advice and successfully completed the conversion using the following command:
Step 1: convert the -hf model to the -chat model
python convert_hf_to_chat.py --input_dir /data/onout/martin/MODELS/internvl3_8b/lora/merged_1596 --output_dir /data/onout/martin/MODELS/internvl3_8b/lora/merged_1596_chat
Step 2: replace the safetensor (I think here is the problem with my case,,,)
mv /data/onout/martin/MODELS/internvl3_8b/lora/merged_1596_chat/model.safetensors /data/onout/martin/MODELS/internvl3_8b/lora/sft/checkpoint-1596/
Step 3: using vLLM
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
import random
import glob
from vllm import LLM, SamplingParams
from PIL import Image
adapter_path = '/data/onout/martin/MODELS/internvl3_8b/lora/sft/checkpoint-1596'
llm = LLM(
# model="OpenGVLab/InternVL3-8B-hf",
model="OpenGVLab/InternVL3-8B",
trust_remote_code=True,
enable_lora=True
)
sampling_params = SamplingParams(
temperature=0.7,
max_tokens=1024
)
img_file_list = glob.glob('path/to/img_dir/*.jpg')
img_file_path = random.choice(img_file_list)
image = Image.open(img_file_path)
instruction1 = "Hi"
inputs = {
"prompt": instruction1,
"multi_modal_data": {"image": image}
}
lora_request = {
"lora_name": "internvl3_lora",
"lora_path": adapter_path
}
outputs = llm.generate([inputs], sampling_params, lora_request=lora_request)
result = outputs[0].outputs[0].text
print(result)
Despite these steps, I still encounter an error when executing the script above.
Do you have any suggestions for what might be going wrong? Thank you again for your support! 🙏
Now we do not support using this script to convert the lora adapter, we should merge LoRA adapter to the HF model then convert the whole checkpoint. Indeed, we need an extra LoRA converting script for internvl...
As you suggested, I first exported my fine-tuned model after merging the LoRA adapter using LLaMA Factory. After that, I proceeded with the steps I mentioned earlier (Step 1–3). I apologize for any confusion I may have caused.
Considering this, could you kindly help me once again to identify a possible solution? (I suspect I may have made a mistake when replacing the safetensor file.)
mv /data/onout/martin/MODELS/internvl3_8b/lora/merged_1596_chat/model.safetensors /data/onout/martin/MODELS/internvl3_8b/lora/sft/checkpoint-1596/
You should move these safetensors to a local dir that contains the configs that come from https://huggingface.co/OpenGVLab/InternVL3-8B. Then vllm python file should be
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
import random
import glob
from vllm import LLM, SamplingParams
from PIL import Image
# adapter_path = '/data/onout/martin/MODELS/internvl3_8b/lora/sft/checkpoint-1596'
llm = LLM(
model="OpenGVLab/InternVL3-8B_after_replacing",
trust_remote_code=True,
)
sampling_params = SamplingParams(
temperature=0.7,
max_tokens=1024
)
img_file_list = glob.glob('path/to/img_dir/*.jpg')
img_file_path = random.choice(img_file_list)
image = Image.open(img_file_path)
instruction1 = "Hi"
inputs = {
"prompt": instruction1,
"multi_modal_data": {"image": image}
}
outputs = llm.generate([inputs], sampling_params)
result = outputs[0].outputs[0].text
print(result)
Thank you so much for your helpful comment and guidance.
In addition to what you mentioned, I also found that deleting the model.safetensors.index.json file from the "OpenGVLab/InternVL3-8B_after_replacing" weight directory was necessary for vLLM to run properly. After removing this file, everything worked as expected.
Thanks again for your support!
@Kuangdd01
能详细说明一下怎么替换这5个json文件吗,我训练加了额外的tokens,当我把全量微调后的checkpoint里的json替换原始chat里的json,然后使用官方vllm推理时会效果变差,用llamafactory的huggingface框架 API推理效果是正常的。
能详细说明一下怎么替换这5个json文件吗,我训练加了额外的tokens,当我把全量微调后的checkpoint里的json替换原始chat里的json,然后使用官方vllm推理时会效果变差,用llamafactory的huggingface框架 API推理效果是正常的。
把下面四个换一下就行了吧
您好 我想用Intern3-VL-8B进行推理 但是报错无法识别mmlm模型 然后我照着您的回答https://github.com/hiyouga/LLaMA-Factory/issues/8086#issuecomment-2898640569 进行修改时遇到了下面的问题
感谢您的关注和回复!
我把https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main。 上的代码和权重下载到了InternVL3-8B文件夹
然后运行指令 python scripts/convert_ckpt/intern3-vl-8b.py --input_dir InternVL3-8B --output_dir saves/internvl3-8b-chat
为什么报错
Traceback (most recent call last):
File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 468, in
@Kuangdd01 我试了一下用hugging face里提供的模版推理保存的checkpoint: from transformers import AutoProcessor, AutoModelForImageTextToText import torch
torch_device = "cuda" model_checkpoint = "OpenGVLab/InternVL3-1B-hf" processor = AutoProcessor.from_pretrained(model_checkpoint) model = AutoModelForImageTextToText.from_pretrained(model_checkpoint, device_map=torch_device, torch_dtype=torch.bfloat16)
messages = [ { "role": "user", "content": [ {"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"}, {"type": "text", "text": "Please describe the image explicitly."}, ], } ]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
generate_ids = model.generate(**inputs, max_new_tokens=50) decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
decoded_output
和用API_PORT=8000 CUDA_VISIBLE_DEVICES=0 llamafactory-cli api的方式来推理保存的checkpoint。
两种方式推理的差异很大,api推理的结果是对的。
您好 我想用Intern3-VL-8B进行推理 但是报错无法识别mmlm模型 然后我照着您的回答#8086 (comment) 进行修改时遇到了下面的问题 感谢您的关注和回复! 我把https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main。 上的代码和权重下载到了InternVL3-8B文件夹 然后运行指令 python scripts/convert_ckpt/intern3-vl-8b.py --input_dir InternVL3-8B --output_dir saves/internvl3-8b-chat 为什么报错 Traceback (most recent call last): File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 468, in main() File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 461, in main reverse_convert_model(args.input_dir, args.output_dir) File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 209, in reverse_convert_model hf_model = InternVLForConditionalGeneration.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 309, in _wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4574, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5031, in _load_pretrained_model disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in _load_state_dict_into_meta_model _load_parameter_into_model(model, param_name, param.to(param_device)) File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 731, in _load_parameter_into_model module.load_state_dict({param_type: tensor}, strict=False, assign=True) File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 2584, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).
@Kuangdd01
@Kuangdd01 我试了一下用hugging face里提供的模版推理保存的checkpoint: from transformers import AutoProcessor, AutoModelForImageTextToText import torch
torch_device = "cuda" model_checkpoint = "OpenGVLab/InternVL3-1B-hf" processor = AutoProcessor.from_pretrained(model_checkpoint) model = AutoModelForImageTextToText.from_pretrained(model_checkpoint, device_map=torch_device, torch_dtype=torch.bfloat16)
messages = [ { "role": "user", "content": [ {"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"}, {"type": "text", "text": "Please describe the image explicitly."}, ], } ]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
generate_ids = model.generate(**inputs, max_new_tokens=50) decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
decoded_output
和用API_PORT=8000 CUDA_VISIBLE_DEVICES=0 llamafactory-cli api的方式来推理保存的checkpoint。
两种方式推理的差异很大,api推理的结果是对的。
可能是这个问题 https://github.com/hiyouga/LLaMA-Factory/issues/8136
您好 我想用Intern3-VL-8B进行推理 但是报错无法识别mmlm模型 然后我照着您的回答#8086 (comment) 进行修改时遇到了下面的问题 感谢您的关注和回复! 我把https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main。 上的代码和权重下载到了InternVL3-8B文件夹 然后运行指令 python scripts/convert_ckpt/intern3-vl-8b.py --input_dir InternVL3-8B --output_dir saves/internvl3-8b-chat 为什么报错 Traceback (most recent call last): File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 468, in main() File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 461, in main reverse_convert_model(args.input_dir, args.output_dir) File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 209, in reverse_convert_model hf_model = InternVLForConditionalGeneration.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 309, in _wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4574, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5031, in _load_pretrained_model disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in _load_state_dict_into_meta_model _load_parameter_into_model(model, param_name, param.to(param_device)) File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 731, in _load_parameter_into_model module.load_state_dict({param_type: tensor}, strict=False, assign=True) File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 2584, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).
看上去是对tokenizer 增加了新的token,lm_head和原来的index.json对不上了
您好 我想用Intern3-VL-8B进行推理 但是报错无法识别mmlm模型 然后我照着您的回答#8086 (comment) 进行修改时遇到了下面的问题 感谢您的关注和回复! 我把https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main。 上的代码和权重下载到了InternVL3-8B文件夹 然后运行指令 python scripts/convert_ckpt/intern3-vl-8b.py --input_dir InternVL3-8B --output_dir saves/internvl3-8b-chat 为什么报错 Traceback (most recent call last): File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 468, in main() File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 461, in main reverse_convert_model(args.input_dir, args.output_dir) File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 209, in reverse_convert_model hf_model = InternVLForConditionalGeneration.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 309, in _wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4574, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5031, in _load_pretrained_model disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in _load_state_dict_into_meta_model _load_parameter_into_model(model, param_name, param.to(param_device)) File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 731, in _load_parameter_into_model module.load_state_dict({param_type: tensor}, strict=False, assign=True) File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 2584, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).
看上去是对tokenizer 增加了新的token,lm_head和原来的index.json对不上了
了解 感谢
here maybe wrong?
@Kuangdd01
import re
# ...
if "qkv" in old_key:
# 只替换中间的 `.layer.`,不会误替换掉 `layers`
safe_key = re.sub(r"\.layer\.", ".layers.", old_key)
final_state_dict[safe_key] = tensor.detach().clone()
这里应该是没问题的,你可以用check_model_conversion()检查一下
我现在git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 而不是 https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main
之前
RuntimeError: Error(s) in loading state_dict for Embedding:
size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).
的问题 我通过
hf_model = AutoModelForCausalLM.from_pretrained(
input_path,
config=config, # ✅ 加上这行
trust_remote_code=True, # ✅ 不然加载不了 InternVL
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
device_map='auto'
)
解决,
但现在我在运行指令
DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl
之后,报错
[rank0]: multiprocess.pool.RemoteTraceback:
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]: result = (True, func(*args, **kwds))
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue
[rank0]: for i, result in enumerate(func(**kwargs)):
[rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single
[rank0]: for i, batch in iter_outputs(shard_iterable):
[rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs
[rank0]: yield i, apply_function(example, i, offset=offset)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function
[rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset
[rank0]: input_ids, labels = self._encode_data_example(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example
[rank0]: messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]: self._validate_input(processor, images, videos, audios)
[rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]: raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """
我的的moel_name_or_path目录下的存在从git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 拉下来的配置文件preprocessor_config.json
内容是
{
"crop_size": 448,
"do_center_crop": true,
"do_normalize": true,
"do_resize": true,
"feature_extractor_type": "CLIPFeatureExtractor",
"image_mean": [
0.485,
0.456,
0.406
],
"image_std": [
0.229,
0.224,
0.225
],
"resample": 3,
"size": 448
}
真诚请教您是否对这个错误有处理思路,因为我需要使用vllm对图文数据集进行推理 vllm serve的方式对我不太友好,因为我的机器屏蔽了所有外部端口 感谢回复 @Kuangdd01
我现在
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main而不是https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main之前
RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).的问题 我通过
hf_model = AutoModelForCausalLM.from_pretrained( input_path, config=config, # ✅ 加上这行 trust_remote_code=True, # ✅ 不然加载不了 InternVL torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map='auto' )解决, 但现在我在运行指令
DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl之后,报错[rank0]: multiprocess.pool.RemoteTraceback: [rank0]: """ [rank0]: Traceback (most recent call last): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker [rank0]: result = (True, func(*args, **kwds)) [rank0]: ^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue [rank0]: for i, result in enumerate(func(**kwargs)): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single [rank0]: for i, batch in iter_outputs(shard_iterable): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs [rank0]: yield i, apply_function(example, i, offset=offset) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function [rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset [rank0]: input_ids, labels = self._encode_data_example( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example [rank0]: messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages [rank0]: self._validate_input(processor, images, videos, audios) [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input [rank0]: raise ValueError("Processor was not found, please check and update your model file.") [rank0]: ValueError: Processor was not found, please check and update your model file. [rank0]: """我的的moel_name_or_path目录下的存在从
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main拉下来的配置文件preprocessor_config.json 内容是{ "crop_size": 448, "do_center_crop": true, "do_normalize": true, "do_resize": true, "feature_extractor_type": "CLIPFeatureExtractor", "image_mean": [ 0.485, 0.456, 0.406 ], "image_std": [ 0.229, 0.224, 0.225 ], "resample": 3, "size": 448 }真诚请教您是否对这个错误有处理思路,因为我需要使用vllm对图文数据集进行推理 vllm serve的方式我想看你对我不太友好,因为我的机器屏蔽了所有外部端口 感谢回复 @Kuangdd01
转换后的目录processor_config.json不能被llamafactory识别 手动hack一下 让载入的tokenizer/processor来自于intern3vl-hf版本
# hack for internvl-hf processor
# tokenizer_module = load_tokenizer(model_args) =>
tokenizer_module = load_tokenizer("local-internvl3-hf-dir")
我现在
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main而不是https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main之前RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).的问题 我通过
hf_model = AutoModelForCausalLM.from_pretrained( input_path, config=config, # ✅ 加上这行 trust_remote_code=True, # ✅ 不然加载不了 InternVL torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map='auto' )解决, 但现在我在运行指令
DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl之后,报错[rank0]: multiprocess.pool.RemoteTraceback: [rank0]: """ [rank0]: Traceback (most recent call last): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker [rank0]: result = (True, func(*args, **kwds)) [rank0]: ^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue [rank0]: for i, result in enumerate(func(**kwargs)): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single [rank0]: for i, batch in iter_outputs(shard_iterable): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs [rank0]: yield i, apply_function(example, i, offset=offset) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function [rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset [rank0]: input_ids, labels = self._encode_data_example( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example [rank0]: messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages [rank0]: self._validate_input(processor, images, videos, audios) [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input [rank0]: raise ValueError("Processor was not found, please check and update your model file.") [rank0]: ValueError: Processor was not found, please check and update your model file. [rank0]: """我的的moel_name_or_path目录下的存在从
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main拉下来的配置文件preprocessor_config.json 内容是{ "crop_size": 448, "do_center_crop": true, "do_normalize": true, "do_resize": true, "feature_extractor_type": "CLIPFeatureExtractor", "image_mean": [ 0.485, 0.456, 0.406 ], "image_std": [ 0.229, 0.224, 0.225 ], "resample": 3, "size": 448 }真诚请教您是否对这个错误有处理思路,因为我需要使用vllm对图文数据集进行推理 vllm serve的方式我想看你对我不太友好,因为我的机器屏蔽了所有外部端口 感谢回复 @Kuangdd01
转换后的目录processor_config.json不能被llamafactory识别 手动hack一下 让载入的tokenizer/processor来自于intern3vl-hf版本
hack for internvl-hf processor
tokenizer_module = load_tokenizer(model_args) =>
tokenizer_module = load_tokenizer("local-internvl3-hf-dir")
您的意思是说从
https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main
中去加载config而不是
https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main
吗
我现在
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main而不是https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main之前RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).的问题 我通过
hf_model = AutoModelForCausalLM.from_pretrained( input_path, config=config, # ✅ 加上这行 trust_remote_code=True, # ✅ 不然加载不了 InternVL torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map='auto' )解决, 但现在我在运行指令
DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl之后,报错[rank0]: multiprocess.pool.RemoteTraceback: [rank0]: """ [rank0]: Traceback (most recent call last): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker [rank0]: result = (True, func(*args, **kwds)) [rank0]: ^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue [rank0]: for i, result in enumerate(func(**kwargs)): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single [rank0]: for i, batch in iter_outputs(shard_iterable): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs [rank0]: yield i, apply_function(example, i, offset=offset) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function [rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset [rank0]: input_ids, labels = self._encode_data_example( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example [rank0]: messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages [rank0]: self._validate_input(processor, images, videos, audios) [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input [rank0]: raise ValueError("Processor was not found, please check and update your model file.") [rank0]: ValueError: Processor was not found, please check and update your model file. [rank0]: """我的的moel_name_or_path目录下的存在从
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main拉下来的配置文件preprocessor_config.json 内容是{ "crop_size": 448, "do_center_crop": true, "do_normalize": true, "do_resize": true, "feature_extractor_type": "CLIPFeatureExtractor", "image_mean": [ 0.485, 0.456, 0.406 ], "image_std": [ 0.229, 0.224, 0.225 ], "resample": 3, "size": 448 }真诚请教您是否对这个错误有处理思路,因为我需要使用vllm对图文数据集进行推理 vllm serve的方式我想看你对我不太友好,因为我的机器屏蔽了所有外部端口 感谢回复 @Kuangdd01
转换后的目录processor_config.json不能被llamafactory识别 手动hack一下 让载入的tokenizer/processor来自于intern3vl-hf版本
hack for internvl-hf processor
tokenizer_module = load_tokenizer(model_args) =>
tokenizer_module = load_tokenizer("local-internvl3-hf-dir")
我把OpenGVLab/InternVL3-8B-hf git clone到了本地
再把转换后的model.safetensors移动到 OpenGVLab/InternVL3-8B-hf文件夹下,删除了原本的model碎片以及碎片index.json
但是运行
DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B-hf --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl
之后就会报错
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 366, in __init__
self.multimodal_config = self._init_multimodal_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 431, in _init_multimodal_config
raise ValueError("`limit_mm_per_prompt` is only supported for "
ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
似乎还不如我第一种方法走的远hhh😭
我现在
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main而不是https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main之前RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).的问题 我通过
hf_model = AutoModelForCausalLM.from_pretrained( input_path, config=config, # ✅ 加上这行 trust_remote_code=True, # ✅ 不然加载不了 InternVL torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map='auto' )解决, 但现在我在运行指令
DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl之后,报错[rank0]: multiprocess.pool.RemoteTraceback: [rank0]: """ [rank0]: Traceback (most recent call last): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker [rank0]: result = (True, func(*args, **kwds)) [rank0]: ^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue [rank0]: for i, result in enumerate(func(**kwargs)): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single [rank0]: for i, batch in iter_outputs(shard_iterable): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs [rank0]: yield i, apply_function(example, i, offset=offset) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function [rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset [rank0]: input_ids, labels = self._encode_data_example( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example [rank0]: messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages [rank0]: self._validate_input(processor, images, videos, audios) [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input [rank0]: raise ValueError("Processor was not found, please check and update your model file.") [rank0]: ValueError: Processor was not found, please check and update your model file. [rank0]: """我的的moel_name_or_path目录下的存在从
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main拉下来的配置文件preprocessor_config.json 内容是{ "crop_size": 448, "do_center_crop": true, "do_normalize": true, "do_resize": true, "feature_extractor_type": "CLIPFeatureExtractor", "image_mean": [ 0.485, 0.456, 0.406 ], "image_std": [ 0.229, 0.224, 0.225 ], "resample": 3, "size": 448 }真诚请教您是否对这个错误有处理思路,因为我需要使用vllm对图文数据集进行推理 vllm serve的方式对我不太友好,因为我的机器屏蔽了所有外部端口 感谢回复 @Kuangdd01
转换后的目录processor_config.json不能被llamafactory识别 手动hack一下 让载入的tokenizer/processor来自于intern3vl-hf版本
hack for internvl-hf processor
tokenizer_module = load_tokenizer(model_args) =>
tokenizer_module = load_tokenizer("local-internvl3-hf-dir")
sorry我觉得我的表达有些混乱 让我重新梳理一下 我利用您的代码 分别转换了InternVL3-8B和InternVL-8B-hf 对于InternVL3-8B在转换后,我遇到了process找不到的问题,
[rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]: self._validate_input(processor, images, videos, audios)
[rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]: raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
感觉应该是vllm无法识别这种非hf的config吧 而对于InternVL-8B-hf 我在转换后,应用了之前的hf的config 但是仍旧报错
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 366, in __init__
self.multimodal_config = self._init_multimodal_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 431, in _init_multimodal_config
raise ValueError("`limit_mm_per_prompt` is only supported for "
ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
😭 @Kuangdd01
我现在
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main而不是https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main之前RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).的问题 我通过
hf_model = AutoModelForCausalLM.from_pretrained( input_path, config=config, # ✅ 加上这行 trust_remote_code=True, # ✅ 不然加载不了 InternVL torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map='auto' )解决, 但现在我在运行指令
DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl之后,报错[rank0]: multiprocess.pool.RemoteTraceback: [rank0]: """ [rank0]: Traceback (most recent call last): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker [rank0]: result = (True, func(*args, **kwds)) [rank0]: ^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue [rank0]: for i, result in enumerate(func(**kwargs)): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single [rank0]: for i, batch in iter_outputs(shard_iterable): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs [rank0]: yield i, apply_function(example, i, offset=offset) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function [rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset [rank0]: input_ids, labels = self._encode_data_example( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example [rank0]: messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages [rank0]: self._validate_input(processor, images, videos, audios) [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input [rank0]: raise ValueError("Processor was not found, please check and update your model file.") [rank0]: ValueError: Processor was not found, please check and update your model file. [rank0]: """我的的moel_name_or_path目录下的存在从
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main拉下来的配置文件preprocessor_config.json 内容是{ "crop_size": 448, "do_center_crop": true, "do_normalize": true, "do_resize": true, "feature_extractor_type": "CLIPFeatureExtractor", "image_mean": [ 0.485, 0.456, 0.406 ], "image_std": [ 0.229, 0.224, 0.225 ], "resample": 3, "size": 448 }真诚请教您是否对这个错误有处理思路,因为我需要使用vllm对图文数据集进行推理 vllm serve的方式对我不太友好,因为我的机器屏蔽了所有外部端口 感谢回复 @Kuangdd01
转换后的目录processor_config.json不能被llamafactory识别 手动hack一下 让载入的tokenizer/processor来自于intern3vl-hf版本
hack for internvl-hf processor
tokenizer_module = load_tokenizer(model_args) =>
tokenizer_module = load_tokenizer("local-internvl3-hf-dir")
sorry我觉得我的表达有些混乱 让我重新梳理一下 我利用您的代码 分别转换了InternVL3-8B和InternVL-8B-hf 对于InternVL3-8B在转换后,我遇到了process找不到的问题,
[rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages [rank0]: self._validate_input(processor, images, videos, audios) [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input [rank0]: raise ValueError("Processor was not found, please check and update your model file.") [rank0]: ValueError: Processor was not found, please check and update your model file.感觉应该是vllm无法识别这种非hf的config吧 而对于InternVL-8B-hf 我在转换后,应用了之前的hf的config 但是仍旧报错
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config model_config = self.create_model_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config return ModelConfig( ^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 366, in __init__ self.multimodal_config = self._init_multimodal_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 431, in _init_multimodal_config raise ValueError("`limit_mm_per_prompt` is only supported for " ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
llamafactory intern_vl template依赖的是-hf的tokenizer和processor, 所以你在vllm_infer这个脚本的时候会显示processor找不到的问题,vllm load的模型继续保持转换户的权重+非-hf版本的config就行,需要针对性修改的是下面这里的逻辑
# 这里你需要让脚本去load -hf版本的config[tokenizer_config.json, processor.config...]
tokenizer_module = load_tokenizer(model_args) # 这里load的路径只能是internvl3-hf version
tokenizer = tokenizer_module["tokenizer"]
diff就是processor你得加载转换前的, 模型你得加载转换后的
我现在
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main而不是https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main之前RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).的问题 我通过
hf_model = AutoModelForCausalLM.from_pretrained( input_path, config=config, # ✅ 加上这行 trust_remote_code=True, # ✅ 不然加载不了 InternVL torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map='auto' )解决, 但现在我在运行指令
DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl之后,报错[rank0]: multiprocess.pool.RemoteTraceback: [rank0]: """ [rank0]: Traceback (most recent call last): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker [rank0]: result = (True, func(*args, **kwds)) [rank0]: ^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue [rank0]: for i, result in enumerate(func(**kwargs)): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single [rank0]: for i, batch in iter_outputs(shard_iterable): [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs [rank0]: yield i, apply_function(example, i, offset=offset) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function [rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset [rank0]: input_ids, labels = self._encode_data_example( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example [rank0]: messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages [rank0]: self._validate_input(processor, images, videos, audios) [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input [rank0]: raise ValueError("Processor was not found, please check and update your model file.") [rank0]: ValueError: Processor was not found, please check and update your model file. [rank0]: """我的的moel_name_or_path目录下的存在从
git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main拉下来的配置文件preprocessor_config.json 内容是{ "crop_size": 448, "do_center_crop": true, "do_normalize": true, "do_resize": true, "feature_extractor_type": "CLIPFeatureExtractor", "image_mean": [ 0.485, 0.456, 0.406 ], "image_std": [ 0.229, 0.224, 0.225 ], "resample": 3, "size": 448 }真诚请教您是否对这个错误有处理思路,因为我需要使用vllm对图文数据集进行推理 vllm serve的方式对我不太友好,因为我的机器屏蔽了所有外部端口 感谢回复 @Kuangdd01
转换后的目录processor_config.json不能被llamafactory识别 手动hack一下 让载入的tokenizer/processor来自于intern3vl-hf版本
hack for internvl-hf processor
tokenizer_module = load_tokenizer(model_args) =>
tokenizer_module = load_tokenizer("local-internvl3-hf-dir")
sorry我觉得我的表达有些混乱 让我重新梳理一下 我利用您的代码 分别转换了InternVL3-8B和InternVL-8B-hf 对于InternVL3-8B在转换后,我遇到了process找不到的问题,
[rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages [rank0]: self._validate_input(processor, images, videos, audios) [rank0]: File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input [rank0]: raise ValueError("Processor was not found, please check and update your model file.") [rank0]: ValueError: Processor was not found, please check and update your model file.感觉应该是vllm无法识别这种非hf的config吧 而对于InternVL-8B-hf 我在转换后,应用了之前的hf的config 但是仍旧报错
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config model_config = self.create_model_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config return ModelConfig( ^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 366, in __init__ self.multimodal_config = self._init_multimodal_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 431, in _init_multimodal_config raise ValueError("`limit_mm_per_prompt` is only supported for " ValueError: `limit_mm_per_prompt` is only supported for multimodal models.llamafactory
intern_vltemplate依赖的是-hf的tokenizer和processor, 所以你在vllm_infer这个脚本的时候会显示processor找不到的问题,vllm load的模型继续保持转换户的权重+非-hf版本的config就行,需要针对性修改的是下面这里的逻辑这里你需要让脚本去load -hf版本的config[tokenizer_config.json, processor.config...]
tokenizer_module = load_tokenizer(model_args) # 这里load的路径只能是internvl3-hf version tokenizer = tokenizer_module["tokenizer"] diff就是processor你得加载转换前的, 模型你得加载转换后的
是的 我现在的权重是用您给的代码转换后的,config是从hf上直接clone下来的InternVL3-8B-hf 但是运行后仍然报错 @Kuangdd01
=== ModelConfig Parameters ===
model: /mnt/bn/search-nlp-us/zhaowending/code/LLaMA-Factory/ckpts/InternVL3-8B-hf
task: auto
tokenizer: /mnt/bn/search-nlp-us/zhaowending/code/LLaMA-Factory/ckpts/InternVL3-8B-hf
tokenizer_mode: auto
trust_remote_code: True
allowed_local_media_path:
dtype: auto
seed: 0
revision: None
code_revision: None
rope_scaling: None
rope_theta: None
hf_overrides: None
tokenizer_revision: None
max_model_len: 4096
quantization: None
enforce_eager: None
max_seq_len_to_capture: 8192
max_logprobs: 20
disable_sliding_window: False
skip_tokenizer_init: False
served_model_name: None
limit_mm_per_prompt: {'image': 10, 'video': 2, 'audio': 2}
use_async_output_proc: True
config_format: ConfigFormat.AUTO
mm_processor_kwargs: None
disable_mm_preprocessor_cache: False
override_neuron_config: None
override_pooler_config: None
logits_processor_pattern: None
generation_config: None
override_generation_config: None
enable_sleep_mode: False
model_impl: auto
[INFO|configuration_utils.py:710] 2025-06-24 06:19:13,862 >> loading configuration file /code/LLaMA-Factory/ckpts/InternVL3-8B-hf/config.json
[INFO|configuration_utils.py:710] 2025-06-24 06:19:13,863 >> loading configuration file /code/LLaMA-Factory/ckpts/InternVL3-8B-hf/config.json
[INFO|configuration_utils.py:775] 2025-06-24 06:19:13,865 >> Model config InternVLConfig {
"architectures": [
"InternVLForConditionalGeneration"
],
"downsample_ratio": 0.5,
"image_seq_length": 256,
"image_token_id": 151667,
"model_type": "internvl",
"projector_hidden_act": "gelu",
"text_config": {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 32768,
"max_window_layers": 70,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": {
"factor": 2.0,
"rope_type": "dynamic",
"type": "dynamic"
},
"rope_theta": 1000000.0,
"sliding_window": null,
"torch_dtype": "bfloat16",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151674
},
"torch_dtype": "bfloat16",
"transformers_version": "4.53.0.dev0",
"vision_config": {
"architectures": [
"InternVisionModel"
],
"attention_bias": true,
"attention_dropout": 0.0,
"dropout": 0.0,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.0,
"hidden_size": 1024,
"image_size": [
448,
448
],
"initializer_factor": 0.1,
"initializer_range": 1e-10,
"intermediate_size": 4096,
"layer_norm_eps": 1e-06,
"layer_scale_init_value": 0.1,
"model_type": "internvl_vision",
"norm_type": "layer_norm",
"num_attention_heads": 16,
"num_channels": 3,
"num_hidden_layers": 24,
"patch_size": [
14,
14
],
"projection_dropout": 0.0,
"torch_dtype": "bfloat16",
"use_absolute_position_embeddings": true,
"use_mask_token": false,
"use_mean_pooling": true,
"use_qk_norm": false
},
"vision_feature_layer": -1,
"vision_feature_select_strategy": "default"
}
Traceback (most recent call last):
File "/code/LLaMA-Factory/scripts/vllm_infer_intern.py", line 211, in <module>
fire.Fire(vllm_infer)
File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/code/LLaMA-Factory/scripts/vllm_infer_intern.py", line 124, in vllm_infer
llm = LLM(**engine_args)
^^^^^^^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/utils.py", line 1022, in inner
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 242, in __init__
self.llm_engine = self.engine_class.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 486, in from_engine_args
engine_config = engine_args.create_engine_config(usage_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1165, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1085, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 366, in __init__
self.multimodal_config = self._init_multimodal_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 431, in _init_multimodal_config
raise ValueError("`limit_mm_per_prompt` is only supported for "
ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
I find why!!!! 把 /mnt/bn/search-nlp-us/zhaowending/code/LLaMA-Factory/src/llamafactory/model/loader.py中
processor = AutoProcessor.from_pretrained(model_args.model_name_or_path, **init_kwargs)
修改为
init_kwargs["trust_remote_code"] = True
processor = AutoProcessor.from_pretrained(
'OpenGVLab/InternVL3-8B-hf',
**init_kwargs
)
### 但是!!!!重要的是!!!!!
我是把internvl3-8B而不是internvl3-8B-hf进行的转换,转换完成后,仍旧在internvl3-8B文件夹下完成后续操作。 就可以使用scripts中的vllm进行推理了,此外,需要注意的是我把internvl3-8B的五个config文件替换为了internvl3-8B-hf的config文件,对一些版本&模型检查进行了屏蔽,如果需要,你可以发送你的报错,我们一起讨论
虽然碰到了一些bug 但我仍在调试 感谢 @Kuangdd01 的耐心解答!