Reminder

[x] I have read the above rules and searched the existing issues.

System Info

Package Version Editable project location

accelerate 1.6.0 aiofiles 22.1.0 aiohappyeyeballs 2.6.1 aiohttp 3.11.18 aiosignal 1.3.2 aiosqlite 0.21.0 airportsdata 20250224 alabaster 0.7.16 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.9.0 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 astor 0.8.1 asttokens 3.0.0 attrs 25.3.0 audioread 3.0.1 av 14.3.0 babel 2.16.0 beautifulsoup4 4.13.3 blake3 1.0.4 bleach 6.2.0 blinker 1.5 blobfile 3.0.0 byted-mario-collector 2.0.8 byted-remote-ikernel 0.4.8 byted-torch 2.5.1.post1 byted-wandb 0.13.86 bytedance-context 0.7.1 bytedance.hdfs-stdenv 0.0.39 bytedance-metrics 0.5.2 bytedbackgrounds 0.0.6 byteddatabus 1.0.6 byteddps 0.1.2 bytedenv 0.6.4 bytedmemfd 0.2 bytedmetrics 0.10.2 bytedservicediscovery 0.18.0 bytedztijwthelper 0.0.23 bytedztispiffe 0.0.16 cachetools 5.5.2 certifi 2024.8.30 cffi 1.17.1 chardet 5.1.0 charset-normalizer 3.4.0 click 8.1.8 cloudpickle 3.1.1 comm 0.2.2 compressed-tensors 0.9.3 contourpy 1.3.2 cryptography 44.0.2 cupy-cuda12x 13.4.1 cycler 0.12.1 datasets 3.5.0 dbus-python 1.3.2 debugpy 1.8.14 decorator 5.2.1 deepspeed 0.16.7 defusedxml 0.7.1 Deprecated 1.2.18 depyf 0.18.0 devscripts 2.23.4+deb12u1 dill 0.3.8 diskcache 5.6.3 distro 1.8.0 distro-info 1.5+deb12u1 dnspython 2.7.0 docker-pycreds 0.4.0 docstring_parser 0.16 docutils 0.19 einops 0.8.1 email_validator 2.2.0 entrypoints 0.4 enum34 1.1.10 executing 2.2.0 fastapi 0.115.12 fastapi-cli 0.0.7 fastjsonschema 2.21.1 fastrlock 0.8.3 ffmpy 0.5.0 filelock 3.16.1 findspark 2.0.1 fire 0.7.0 fonttools 4.57.0 fqdn 1.5.1 frozenlist 1.6.0 fsspec 2024.10.0 gguf 0.16.3 gitdb 4.0.12 GitPython 3.1.44 googleapis-common-protos 1.70.0 gpg 1.18.0 gradio_client 1.8.0 greenlet 3.1.1 groovy 0.1.2 grpcio 1.71.0 h11 0.16.0 hf-xet 1.0.5 hjson 3.1.0 httpcore 1.0.9 httplib2 0.20.4 httptools 0.6.4 httpx 0.28.1 huggingface-hub 0.30.2 idna 3.10 imagesize 1.4.1 importlib_metadata 8.0.0 interegular 0.3.3 iotop 0.6 ipaddress 1.0.23 ipykernel 6.29.5 ipython 9.0.2 ipython-genutils 0.2.0 ipython_pygments_lexers 1.1.1 ipywidgets 8.1.5 isoduration 20.11.0 jedi 0.19.2 jieba 0.42.1 Jinja2 3.1.6 jiter 0.9.0 joblib 1.4.2 json5 0.10.0 jsonpointer 3.0.0 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 jupyter 1.0.0 jupyter_client 7.4.9 jupyter-console 6.6.3 jupyter_core 5.7.2 jupyter-events 0.12.0 jupyter-kernel-gateway 2.5.2 jupyter_server 2.15.0 jupyter_server_fileid 0.9.3 jupyter_server_terminals 0.5.3 jupyter_server_ydoc 0.8.0 jupyter-ydoc 0.2.5 jupyterlab 3.6.8 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.3 jupyterlab_widgets 3.0.13 kiwisolver 1.4.8 lark 1.2.2 lazr.restfulclient 0.14.5 lazr.uri 1.0.6 lazy_loader 0.4 librosa 0.11.0 llamafactory 0.9.3.dev0 /mlx_devbox/users/zhaomeng.2000/playground/LLama-Factory llguidance 0.7.19 llvmlite 0.44.0 lm-format-enforcer 0.10.11 lxml 5.4.0 markdown-it-py 3.0.0 MarkupSafe 3.0.2 matplotlib 3.10.1 matplotlib-inline 0.1.7 mdurl 0.1.2 merlin_kernel 0.1 mistral_common 1.5.4 mistune 3.1.3 mlx-python-sdk 0.3.0 modelscope 1.25.0 mpmath 1.3.0 msgpack 1.0.8 msgspec 0.19.0 multidict 6.4.3 multiprocess 0.70.16 nbclassic 1.2.0 nbclient 0.10.2 nbconvert 7.16.6 nbformat 5.10.4 nest-asyncio 1.6.0 networkx 3.4.2 ninja 1.11.1.4 nltk 3.9.1 none 0.1.1 notebook 6.5.7 notebook_shim 0.2.4 numba 0.61.2 numpy 1.26.4 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-cusparselt-cu12 0.6.2 nvidia-ml-py 12.570.86 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 oauthlib 3.2.2 omegaconf 2.3.0 openai 1.76.0 opencv-python-headless 4.11.0.86 opentelemetry-api 1.26.0 opentelemetry-exporter-otlp 1.26.0 opentelemetry-exporter-otlp-proto-common 1.26.0 opentelemetry-exporter-otlp-proto-grpc 1.26.0 opentelemetry-exporter-otlp-proto-http 1.26.0 opentelemetry-proto 1.26.0 opentelemetry-sdk 1.26.0 opentelemetry-semantic-conventions 0.47b0 opentelemetry-semantic-conventions-ai 0.4.3 orjson 3.10.16 outlines 0.1.11 outlines_core 0.1.26 overrides 7.7.0 packaging 24.1 pandas 2.2.3 pandocfilters 1.5.1 parso 0.8.4 partial-json-parser 0.2.1.1.post5 pathtools 0.1.2 peft 0.15.1 pexpect 4.8.0 pillow 11.2.1 pip 25.0.1 platformdirs 4.3.7 pooch 1.8.2 prometheus_client 0.21.1 prometheus-fastapi-instrumentator 7.1.0 promise 2.3 prompt_toolkit 3.0.50 propcache 0.3.1 protobuf 3.20.3 psutil 7.0.0 ptyprocess 0.7.0 pure_eval 0.2.3 py-cpuinfo 9.0.0 py4j 0.10.9.9 pyairports 2.1.1 pyarrow 19.0.1 pycountry 24.6.1 pycparser 2.22 pycryptodomex 3.22.0 pydantic 2.10.6 pydantic_core 2.27.2 pydub 0.25.1 Pygments 2.18.0 PyGObject 3.42.2 PyJWT 2.6.0 pyOpenSSL 25.0.0 pyparsing 3.0.9 python-apt 2.6.0 python-dateutil 2.9.0.post0 python-debian 0.1.49 python-dotenv 1.0.1 python-json-logger 3.3.0 python-magic 0.4.26 python-multipart 0.0.20 pytz 2025.2 pyxdg 0.28 PyYAML 6.0.2 pyzmq 26.3.0 qtconsole 5.6.1 QtPy 2.4.3 ray 2.43.0 reactivex 4.0.4 referencing 0.36.2 regex 2024.11.6 requests 2.32.3 rfc3339-validator 0.1.4 rfc3986 2.0.0 rfc3986-validator 0.1.1 rich 14.0.0 rich-toolkit 0.14.3 rouge-chinese 1.0.3 rpds-py 0.23.1 ruff 0.11.7 safehttpx 0.1.6 safetensors 0.5.3 schedule 1.2.2 scikit-learn 1.6.1 scipy 1.15.2 semantic-version 2.10.0 Send2Trash 1.8.3 sentencepiece 0.2.0 sentry-sdk 2.24.0 setproctitle 1.3.5 setuptools 65.7.0 shellingham 1.5.4 shortuuid 1.0.13 shtab 1.7.2 six 1.16.0 smmap 5.0.2 sniffio 1.3.1 snowballstemmer 2.2.0 soundfile 0.13.1 soupsieve 2.6 soxr 0.5.0.post1 Sphinx 5.3.0 sphinxcontrib-applehelp 2.0.0 sphinxcontrib-devhelp 2.0.0 sphinxcontrib-htmlhelp 2.1.0 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 2.0.0 sphinxcontrib-serializinghtml 2.0.0 sphinxcontrib-websupport 2.0.0 SQLAlchemy 2.0.27 sse-starlette 2.3.3 stack-data 0.6.3 starlette 0.46.2 sympy 1.13.1 termcolor 3.0.1 terminado 0.18.1 threadpoolctl 3.6.0 tiktoken 0.9.0 tinycss2 1.4.0 tokenizers 0.21.1 tomlkit 0.13.2 torch 2.6.0 torchaudio 2.6.0 torchvision 0.21.0 tornado 6.4.2 tqdm 4.67.1 traitlets 5.14.3 transformers 4.52.0.dev0 triton 3.2.0 trl 0.9.6 typer 0.15.2 types-python-dateutil 2.9.0.20241206 typing_extensions 4.12.2 tyro 0.8.14 tzdata 2025.2 ujson 5.10.0 unattended-upgrades 0.1 unidiff 0.7.3 unzip 1.0.0 uri-template 1.3.0 urllib3 1.26.20 uvicorn 0.34.2 uvloop 0.21.0 vllm 0.8.5 wadllib 1.3.6 watchfiles 1.0.5 wcwidth 0.2.13 webcolors 24.11.1 webencodings 0.5.1 websocket-client 1.8.0 websockets 15.0.1 wheel 0.44.0 widgetsnbextension 4.0.13 wrapt 1.17.2 xdg 5 xformers 0.0.29.post2 xgrammar 0.1.18 xxhash 3.5.0 y-py 0.6.2 yarl 1.20.0 ypy-websocket 0.8.4 zipp 3.21.0

Reproduction

Traceback (most recent call last): File "/mlx_devbox/users/zhaomeng.2000/playground/LLama-Factory/scripts/vllm_image.py", line 169, in fire.Fire(vllm_infer) File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/mlx_devbox/users/zhaomeng.2000/playground/LLama-Factory/scripts/vllm_image.py", line 157, in vllm_infer results = LLM(**engine_args).generate(inputs, sampling_params, lora_request=lora_request) ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/utils.py", line 1161, in inner return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/llm.py", line 247, in init self.llm_engine = LLMEngine.from_engine_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/engine/llm_engine.py", line 503, in from_engine_args vllm_config = engine_args.create_engine_config(usage_context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/engine/arg_utils.py", line 1099, in create_engine_config model_config = self.create_model_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/engine/arg_utils.py", line 987, in create_model_config return ModelConfig( ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/config.py", line 517, in init self.multimodal_config = self._init_multimodal_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/config.py", line 590, in _init_multimodal_config raise ValueError("limit_mm_per_prompt is only supported for " ValueError: limit_mm_per_prompt is only supported for multimodal models.

Others

这是vllm推理文件

Copyright 2025 the LlamaFactory team.

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

import json from typing import Optional

import fire from transformers import Seq2SeqTrainingArguments

from llamafactory.data import get_dataset, get_template_and_fix_tokenizer from llamafactory.extras.constants import IGNORE_INDEX from llamafactory.extras.misc import get_device_count from llamafactory.extras.packages import is_vllm_available from llamafactory.hparams import get_infer_args from llamafactory.model import load_tokenizer

if is_vllm_available(): from vllm import LLM, SamplingParams from vllm.lora.request import LoRARequest

def vllm_infer( model_name_or_path: str, adapter_name_or_path: str = None, dataset: str = "Qwen-921", dataset_dir: str = "/mlx_devbox/users/zhaomeng.2000/playground/LLama-Factory/data", template: str = "intern_vl", cutoff_len: int = 2048, max_samples: Optional[int] = None, vllm_config: str = "{}", save_name: str = "/mlx_devbox/users/zhaomeng.2000/playground/InternVL3-image/result/result-InternVL3-8B-NoPe-921.jsonl", temperature: float = 0.6, top_p: float = 0.95, top_k: int = 50, max_new_tokens: int = 1024, repetition_penalty: float = 1.0, skip_special_tokens: bool = True, seed: Optional[int] = None, pipeline_parallel_size: int = 1, image_max_pixels: int = 768 * 768, image_min_pixels: int = 32 * 32, video_fps: float = 2.0, video_maxlen: int = 128, ): r"""Perform batch generation using vLLM engine, which supports tensor parallelism.

Usage: python vllm_infer.py --model_name_or_path meta-llama/Llama-2-7b-hf --template llama --dataset alpaca_en_demo
"""
if pipeline_parallel_size > get_device_count():
    raise ValueError("Pipeline parallel size should be smaller than the number of gpus.")

model_args, data_args, _, generating_args = get_infer_args(
    dict(
        model_name_or_path=model_name_or_path,
        adapter_name_or_path=adapter_name_or_path,
        dataset=dataset,
        dataset_dir=dataset_dir,
        template=template,
        cutoff_len=cutoff_len,
        max_samples=max_samples,
        preprocessing_num_workers=16,
        vllm_config=vllm_config,
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        max_new_tokens=max_new_tokens,
        repetition_penalty=repetition_penalty,
    )
)

training_args = Seq2SeqTrainingArguments(output_dir="dummy_dir")
tokenizer_module = load_tokenizer(model_args)
tokenizer = tokenizer_module["tokenizer"]
template_obj = get_template_and_fix_tokenizer(tokenizer, data_args)
template_obj.mm_plugin.expand_mm_tokens = False  # for vllm generate
dataset_module = get_dataset(template_obj, model_args, data_args, training_args, "ppo", **tokenizer_module)

inputs, prompts, labels = [], [], []
for sample in dataset_module["train_dataset"]:
    if sample["images"]:
        multi_modal_data = {
            "image": template_obj.mm_plugin._regularize_images(
                sample["images"], image_max_pixels=image_max_pixels, image_min_pixels=image_min_pixels
            )["images"]
        }
    elif sample["videos"]:
        multi_modal_data = {
            "video": template_obj.mm_plugin._regularize_videos(
                sample["videos"],
                image_max_pixels=image_max_pixels,
                image_min_pixels=image_min_pixels,
                video_fps=video_fps,
                video_maxlen=video_maxlen,
            )["videos"]
        }
    elif sample["audios"]:
        audio_data = template_obj.mm_plugin._regularize_audios(
            sample["audios"],
            sampling_rate=16000,
        )
        multi_modal_data = {"audio": zip(audio_data["audios"], audio_data["sampling_rates"])}
    else:
        multi_modal_data = None

    inputs.append({"prompt_token_ids": sample["input_ids"], "multi_modal_data": multi_modal_data})
    prompts.append(tokenizer.decode(sample["input_ids"], skip_special_tokens=skip_special_tokens))
    labels.append(
        tokenizer.decode(
            list(filter(lambda x: x != IGNORE_INDEX, sample["labels"])), skip_special_tokens=skip_special_tokens
        )
    )

sampling_params = SamplingParams(
    repetition_penalty=generating_args.repetition_penalty or 1.0,  # repetition_penalty must > 0
    temperature=generating_args.temperature,
    top_p=generating_args.top_p or 1.0,  # top_p must > 0
    top_k=generating_args.top_k or -1,  # top_k must > 0
    stop_token_ids=template_obj.get_stop_token_ids(tokenizer),
    max_tokens=generating_args.max_new_tokens,
    skip_special_tokens=skip_special_tokens,
    seed=seed,
)
if model_args.adapter_name_or_path is not None:
    lora_request = LoRARequest("default", 1, model_args.adapter_name_or_path[0])
else:
    lora_request = None

engine_args = {
    "model": model_args.model_name_or_path,
    "trust_remote_code": True,
    "dtype": model_args.infer_dtype,
    "max_model_len": cutoff_len + max_new_tokens,
    "tensor_parallel_size": (get_device_count() // pipeline_parallel_size) or 1,
    "pipeline_parallel_size": pipeline_parallel_size,
    "disable_log_stats": True,
    "enable_lora": model_args.adapter_name_or_path is not None,
    #  "dtype": "float16"
}
if template_obj.mm_plugin.__class__.__name__ != "BasePlugin":
    engine_args["limit_mm_per_prompt"] = {"image": 4, "video": 2, "audio": 2}

if isinstance(model_args.vllm_config, dict):
    engine_args.update(model_args.vllm_config)

results = LLM(**engine_args).generate(inputs, sampling_params, lora_request=lora_request)
preds = [result.outputs[0].text for result in results]
with open(save_name, "w", encoding="utf-8") as f:
    for text, pred, label in zip(prompts, preds, labels):
        f.write(json.dumps({"prompt": text, "predict": pred, "label": label}, ensure_ascii=False) + "\n")

print("*" * 70)
print(f"{len(prompts)} generated results have been saved at {save_name}.")
print("*" * 70)

if name == "main": fire.Fire(vllm_infer)

执行命令：DISABLE_VERSION_CHECK=1 python scripts/vllm_image.py --model_name_or_path /mlx_devbox/users/zhaomeng.2000/playground/InternVL3-8B-hf --adapter_name_or_path /mlx_devbox/users/zhaomeng.2000/playground/InternVL3-image/model_8B

May 16 '25 05:05 zhaomeng1234456

因为vllm的官方代码里支持的是internvl-chat版本，所以识别不到，我有空加一个internvl-hf -> internvl-chat的转换，这里有和你一样的问题https://github.com/hiyouga/LLaMA-Factory/pull/7258#issuecomment-2858717733

May 16 '25 05:05 Kuangdd01

我用 llamafactory sft 的 internvl3-hf 模型，vllm server 启动时报的另一个错：AttributeError: 'InternVLConfig' object has no attribute 'vocab_size'

部分上下文如下：

...
  File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_worker.py", line 162, in load_model
    self.model_runner.load_model()
  File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1332, in load_model
    self.model = get_model(vllm_config=self.vllm_config)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
    return loader.load_model(vllm_config=vllm_config)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 452, in load_model
    model = _initialize_model(vllm_config=vllm_config)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
    return model_class(vllm_config=vllm_config, prefix=prefix)
  File "/usr/local/lib/python3.9/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
    old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 377, in __init__
    self.model = TransformersModel(vllm_config=vllm_config, prefix=prefix)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 157, in __init__
    config.vocab_size,
  File "/usr/local/lib/python3.9/dist-packages/transformers/configuration_utils.py", line 211, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'InternVLConfig' object has no attribute 'vocab_size'
...

May 21 '25 04:05 piamo

Now we offer a simple script for guys who want to use VLLM serve InternVL-series after training -HF version model.

[!Warning]

The following pipeline is only verified with InternVL3-2B-hf with lm_type: qwen2. More checks are needed.

Replace the tokenizer, which is needed if you add special tokens in training.

Quick usage:

python ${this_script} --input_dir ./saves/internvl3-2b/sft/llava_1k/checkpoint-500 --output_dir internvl3-2b-chat
mv internvl3-2b-chat/model.safetensors InternVL3-2B/ # replace
vllm serve InternVL3-2B

Convert your saved checkpoint to the InternVL-Chat version with the following script.
Replace the original model.safetenosrs with converted ones in the original dir.
Use VLLM serve "{your replaced dir}"

import argparse
import gc
import os
import re
import torch
from einops import rearrange
from transformers import InternVLForConditionalGeneration, AutoConfig
from safetensors import safe_open
from safetensors.torch import save_file

LM_TYPE_CORRESPONDENCE = {
    "OpenGVLab/InternVL2_5-1B-MPO": "qwen2",
    "OpenGVLab/InternVL2_5-2B-MPO": "llama",
    "OpenGVLab/InternVL2_5-4B-MPO": "qwen2",
    "OpenGVLab/InternVL2_5-8B-MPO": "llama",
    "OpenGVLab/InternVL2_5-26B-MPO": "llama",
    "OpenGVLab/InternVL2_5-38B-MPO": "qwen2",
    "OpenGVLab/InternVL2_5-78B-MPO": "qwen2",
    "OpenGVLab/InternVL3-1B": "qwen2",
    "OpenGVLab/InternVL3-2B": "qwen2",
    "OpenGVLab/InternVL3-8B": "qwen2",
    "OpenGVLab/InternVL3-9B": "llama",
    "OpenGVLab/InternVL3-14B": "qwen2",
    "OpenGVLab/InternVL3-38B": "qwen2",
    "OpenGVLab/InternVL3-78B": "qwen2",
}

# Reverse mapping dictionaries
CONVERTED_TO_ORIGINAL_KEY_MAPPING_VISION = {
    r"model\.vision_tower": r"vision_model",
    r"layer": r"layers",
    r"cls_token": r"class_embedding",
    r"position_embeddings": r"position_embedding",
    r"patch_embeddings.projection": r"patch_embedding",
    r"lambda_(\d+)": r"ls\1",
    r"attention.projection_layers": r"attn.proj",
    r"attention.projection_dropout": r"attn.dropout",
    r"attention": r"attn",
    r"layersnorm_before": r"norm1",
    r"layersnorm_after": r"norm2",
}

CONVERTED_TO_ORIGINAL_KEY_MAPPING_TEXT_LLAMA = {
    r"embed_tokens": r"tok_embeddings",
    r"self_attn.o_proj": r"attention.wo",
    r"mlp.gate_proj": r"feed_forward.w1",
    r"mlp.down_proj": r"feed_forward.w2",
    r"mlp.up_proj": r"feed_forward.w3",
    r"input_layernorm": r"attention_norm",
    r"post_attention_layernorm": r"ffn_norm",
    r"lm_head": r"output",
}

CONVERTED_TO_ORIGINAL_KEY_MAPPING_MULTI = {
    r"model.multi_modal_projector.layer_norm": r"mlp1.0",
    r"model.multi_modal_projector.linear_1": r"mlp1.1", 
    r"model.multi_modal_projector.linear_2": r"mlp1.3",
}

def convert_new_keys_to_old_keys(state_dict_keys, lm_type):
    """Convert HF format keys back to original format"""
    output_dict = {}
    
    # Vision model keys
    vision_keys = [key for key in state_dict_keys if key.startswith("model.vision_tower")]
    vision_keys_text = "\n".join(vision_keys)
    # now we should replace the vision_atten qkv
    new_vision_text = vision_keys_text
    for pattern, replacement in CONVERTED_TO_ORIGINAL_KEY_MAPPING_VISION.items():
        new_vision_text = re.sub(pattern, replacement, new_vision_text)
    output_dict.update(dict(zip(vision_keys, new_vision_text.split("\n"))))
    
    # Language model keys
    language_keys = [key for key in state_dict_keys if key.startswith("model.language_model") or key.startswith("lm_head")]
    language_keys_text = "\n".join(language_keys)
    language_keys_text = language_keys_text.replace("model.language_model", "language_model.model") # reverse order of keys
    new_language_text = language_keys_text
    if lm_type == "llama":
        for pattern, replacement in CONVERTED_TO_ORIGINAL_KEY_MAPPING_TEXT_LLAMA.items():
            new_language_text = re.sub(pattern, replacement, new_language_text)
    output_dict.update(dict(zip(language_keys, new_language_text.split("\n"))))
    
    # Multi-modal keys
    multi_keys = [key for key in state_dict_keys if key.startswith("model.multi_modal_projector")]
    multi_keys_text = "\n".join(multi_keys)
    new_multi_text = multi_keys_text
    for pattern, replacement in CONVERTED_TO_ORIGINAL_KEY_MAPPING_MULTI.items():
        new_multi_text = re.sub(pattern, replacement, new_multi_text)
    output_dict.update(dict(zip(multi_keys, new_multi_text.split("\n"))))
    
    return output_dict

def recombine_attention_weights(hf_state_dict, lm_type, config):
    """
    Recombine the separated attention weights back into original format
    Mainly for visual parts of the model
    """
    new_state_dict = {}
    
    # Process vision model attention weights
    vision_keys = [k for k in list(hf_state_dict.keys()) if k.startswith("model.vision_tower")]
    for key in vision_keys:
        if "attention.q_proj" in key and "bias" not in key:
            # model.vision_tower
            base_key = key.replace("attention.q_proj", "attn.qkv")
            q_weights = hf_state_dict[key]
            k_weights = hf_state_dict[key.replace("q_proj", "k_proj")]
            v_weights = hf_state_dict[key.replace("q_proj", "v_proj")]
            
            # Concatenate q, k, v weights
            qkv_weights = torch.cat([q_weights, k_weights, v_weights], dim=0)
            new_state_dict[base_key.replace("model.vision_tower", "vision_model")] = qkv_weights
        elif "attention.q_proj" in key and "bias" in key:
            base_key = key.replace("attention.q_proj", "attn.qkv") # attn.qkv.bias
            q_bias = hf_state_dict[key]
            k_bias = hf_state_dict[key.replace("q_proj", "k_proj")]
            v_bias = hf_state_dict[key.replace("q_proj", "v_proj")]
            qkv_bias = torch.cat([q_bias, k_bias, v_bias], dim=0)
            new_state_dict[base_key.replace("model.vision_tower", "vision_model")] = qkv_bias

            # del new_state_dict[key]
            # del new_state_dict[key.replace("q_proj", "k_proj")]
            # del new_state_dict[key.replace("q_proj", "v_proj")]
        elif "attention.k_proj" in key or "attention.v_proj" in key:
            continue
        else:
            # Copy other weights directly
            new_state_dict[key] = hf_state_dict[key]
    # Process language model attention weights - specific to model type
    if lm_type == "llama":
        for key in hf_state_dict.keys():
            if "self_attn.q_proj" in key:
                # For Llama models, reconstruct combined wqkv
                base_key = key.replace("self_attn.q_proj", "attention.wqkv")
                q_weights = hf_state_dict[key]
                k_weights = hf_state_dict[key.replace("q_proj", "k_proj")]
                v_weights = hf_state_dict[key.replace("q_proj", "v_proj")]
                
                # Reconstruct wqkv based on model configuration
                num_heads = config.text_config.num_attention_heads
                num_kv_heads = config.text_config.num_key_value_heads
                head_dim = config.text_config.hidden_size // num_heads
                num_key_value_groups = num_heads // num_kv_heads
                
                # Reshape to get individual head weights
                q_heads = q_weights.view(num_heads, head_dim, -1)
                k_heads = k_weights.view(num_kv_heads, head_dim, -1)
                v_heads = v_weights.view(num_kv_heads, head_dim, -1)
                
                # Recombine in the original wqkv format
                # This is a complex process that depends on specific implementation details
                if num_key_value_groups > 1:
                    # Handle grouped query attention case
                    wqkv = torch.cat([
                        q_heads.reshape(-1, q_heads.size(-1)),
                        k_heads.reshape(-1, k_heads.size(-1)),
                        v_heads.reshape(-1, v_heads.size(-1))
                    ], dim=0)
                else:
                    # Handle regular attention case
                    shapes = (num_heads, 2 + num_key_value_groups, head_dim, q_heads.size(-1))
                    wqkv_tensors = torch.zeros(shapes, device=q_heads.device, dtype=q_heads.dtype)
                    wqkv_tensors[:, :num_key_value_groups, ...] = q_heads.unsqueeze(1)
                    wqkv_tensors[:, -2, ...] = k_heads
                    wqkv_tensors[:, -1, ...] = v_heads
                    wqkv = wqkv_tensors.reshape(-1, q_heads.size(-1))
                
                new_state_dict[base_key] = wqkv
            elif "self_attn.k_proj" in key or "self_attn.v_proj" in key:
                # Skip as handled in q_proj processing
                continue
            else:
                new_key = key
                # Add other conversions as needed
                new_state_dict[new_key] = hf_state_dict[key]
    else:
        # For other model types (e.g., qwen2), copy all non-vision keys directly
        # which is compatible with the original format
        for key in hf_state_dict.keys():
            if not key.startswith("vision_tower"):
                new_state_dict[key] = hf_state_dict[key]
            elif key.startswith("model.language_model"):
                new_state_dict[key.replace("model.language_model", "language_model.model")] = hf_state_dict[key]
    
    return new_state_dict

def reverse_convert_model(input_path, output_path):
    """Convert a HuggingFace format InternVL model back to the original format using safetensors"""
    print(f"Loading HF model from {input_path}...")
    
    # Determine model type from path or config
    model_name = os.path.basename(input_path).replace("-hf", "")
    lm_type = None
    for original_name, _type in LM_TYPE_CORRESPONDENCE.items():
        if model_name in original_name:
            lm_type = _type
            break
    if lm_type is None:
        # Default to qwen2 if unknown
        print("Couldn't determine language model type, defaulting to qwen2")
        lm_type = "qwen2"
    
    print(f"Detected language model type: {lm_type}")
    
    # Load model
    config = AutoConfig.from_pretrained(input_path)
    print("Loading model weights...")
    hf_model = InternVLForConditionalGeneration.from_pretrained(
        input_path, 
        torch_dtype=torch.bfloat16,  # Use float16 to reduce memory usage
        low_cpu_mem_usage=True,
        device_map='auto'  # Use device_map to load large models
    )
    
    # Extract state dict
    print("Extracting state dictionary...")
    hf_state_dict = hf_model.state_dict()
    
    # Check if state_dict is empty or very small
    num_params = sum(p.numel() for p in hf_model.parameters())
    print(f"Model has {num_params} parameters")
    print(f"State dict has {len(hf_state_dict)} keys")
    
    # 1. Rename keys to original format
    print("Converting keys to original format...")
    all_keys = list(hf_state_dict.keys())
    key_mapping = convert_new_keys_to_old_keys(all_keys, lm_type)
    
    # 2. Recombine attention weights
    print("Recombining attention weights...")
    original_state_dict = recombine_attention_weights(hf_state_dict, lm_type, config)
    
    # 3. Apply key mapping
    print("Applying key mapping to restore original format...")
    final_state_dict = {}
    for old_key, tensor in original_state_dict.items():
        new_key = key_mapping.get(old_key, old_key)
        if "qkv" in old_key: # hack for new key
            final_state_dict[old_key.replace("layer", "layers")] = tensor.detach().clone()
        elif "lm_head.weight" in old_key: # hardcode
            final_state_dict["language_model.lm_head.weight"] = tensor.detach().clone()
        else:
            final_state_dict[new_key] = tensor.detach().clone()  # Make sure we have a copy of the tensor
    
    # 4. Save model in original format using safetensors
    os.makedirs(output_path, exist_ok=True)
    safetensors_path = os.path.join(output_path, "model.safetensors")
    print(f"Saving model in safetensors format to {safetensors_path}")
    
    # Convert to CPU before saving if on GPU
    for key in list(final_state_dict.keys()):
        if final_state_dict[key].device.type != 'cpu':
            final_state_dict[key] = final_state_dict[key].cpu()
    
    # Check tensor sizes before saving
    total_size_gb = sum(tensor.numel() * tensor.element_size() for tensor in final_state_dict.values()) / 1024**3
    print(f"Total size of state dict to save: {total_size_gb:.2f} GB")
    
    keys_to_remove = [k for k in final_state_dict.keys() if '_proj' in k and "vision" in k]
    if keys_to_remove:
        print(f"Removing {len(keys_to_remove)} keys containing '_proj'...")
        for key in keys_to_remove:
            print(f"  Removing: {key}")
            del final_state_dict[key]
        print(f"After removal, state dict contains {len(final_state_dict)} keys")

    # Save each key-value pair in the final state dict
    try:
        print("Saving tensors...")
        save_file(final_state_dict, safetensors_path)
        print(f"Successfully saved model to {safetensors_path}")
        
        # Verify the saved file
        file_size_gb = os.path.getsize(safetensors_path) / 1024**3
        print(f"Saved file size: {file_size_gb:.2f} GB")
        
        # Optionally verify we can read the saved file
        print("Verifying saved file...")
        with safe_open(safetensors_path, framework="pt", device="cpu") as f:
            keys = f.keys()
            print(f"SafeTensors file contains {len(keys)} keys")
            
    except Exception as e:
        print(f"Error saving model: {e}")
        
        # Fallback to PyTorch format if safetensors fails
        print("Falling back to PyTorch binary format...")
        torch.save(final_state_dict, os.path.join(output_path, "pytorch_model.bin"))
    
    # Clean up to free memory
    del hf_model, hf_state_dict, original_state_dict, final_state_dict
    gc.collect()
    torch.cuda.empty_cache() if torch.cuda.is_available() else None
    
    print("Model conversion complete")

def check_model_conversion(original_model_path, converted_model_path):
    """
    Compare the original model state dict with the converted model state dict
    to ensure the conversion was successful.
    """
    print(f"Checking model conversion between {original_model_path} and {converted_model_path}...")
    
    # Load original model state dict using AutoModel
    print("Loading original model state dict...")
    try:
        from transformers import AutoModel
        
        # Load original model and get state dict
        original_model = AutoModel.from_pretrained(
            original_model_path,
            torch_dtype=torch.bfloat16,  # Use lower precision to reduce memory usage
            low_cpu_mem_usage=True,
            use_flash_attn=False,
            trust_remote_code=True,
        ).eval()
        
        original_state_dict = original_model.state_dict()
        
        # Free up memory
        del original_model
        gc.collect()
        torch.cuda.empty_cache() if torch.cuda.is_available() else None
        
    except Exception as e:
        print(f"Error loading original model: {e}")
        print("Trying to load state dict directly...")
        
        # Fallback to loading state dict files directly
        try:
            # Try safetensors first
            original_safetensors_path = os.path.join(original_model_path, "model.safetensors")
            if os.path.exists(original_safetensors_path):
                with safe_open(original_safetensors_path, framework="pt", device="cpu") as f:
                    original_state_dict = {k: f.get_tensor(k) for k in f.keys()}
            else:
                # Fall back to PyTorch format
                original_bin_path = os.path.join(original_model_path, "pytorch_model.bin")
                if os.path.exists(original_bin_path):
                    original_state_dict = torch.load(original_bin_path, map_location="cpu")
                else:
                    raise FileNotFoundError(f"Could not find model files in {original_model_path}")
        except Exception as e:
            print(f"Error loading original model state dict: {e}")
            return
    
    # Load converted model state dict
    print("Loading converted model state dict...")
    try:
        converted_safetensors_path = os.path.join(converted_model_path, "model.safetensors")
        if os.path.exists(converted_safetensors_path):
            with safe_open(converted_safetensors_path, framework="pt", device="cpu") as f:
                converted_state_dict = {k: f.get_tensor(k) for k in f.keys()}
        else:
            # Fall back to PyTorch format
            converted_bin_path = os.path.join(converted_model_path, "pytorch_model.bin")
            if os.path.exists(converted_bin_path):
                converted_state_dict = torch.load(converted_bin_path, map_location="cpu")
            else:
                raise FileNotFoundError(f"Could not find model files in {converted_model_path}")
    except Exception as e:
        print(f"Error loading converted model: {e}")
        return
    
    # Compare state dicts
    original_keys = set(original_state_dict.keys())
    converted_keys = set(converted_state_dict.keys())
    
    # Check for missing keys
    missing_in_converted = original_keys - converted_keys
    missing_in_original = converted_keys - original_keys
    common_keys = original_keys.intersection(converted_keys)
    
    print(f"Total keys in original model: {len(original_keys)}")
    print(f"Total keys in converted model: {len(converted_keys)}")
    print(f"Keys missing in converted model: {len(missing_in_converted)}")
    print(f"Extra keys in converted model: {len(missing_in_original)}")
    print(f"Common keys: {len(common_keys)}")
    
    if missing_in_converted:
        print("\nSample of missing keys in converted model:")
        for key in list(missing_in_converted)[:10]:
            print(f"  {key}")

    if missing_in_original:
        print("\nSample of extra keys in converted model:")
        for key in list(missing_in_original)[:200]:
            print(f"  {key}")
    
    # Check tensor shapes and values for common keys
    shape_mismatches = []
    value_mismatches = []
    max_diff = 0.0
    
    for key in common_keys:
        orig_tensor = original_state_dict[key]
        conv_tensor = converted_state_dict[key]
        
        # Check shapes
        if orig_tensor.shape != conv_tensor.shape:
            shape_mismatches.append((key, orig_tensor.shape, conv_tensor.shape))
            continue
        
        # Check values (sample a few elements to avoid excessive memory usage)
        try:
            if orig_tensor.numel() > 1000:
                # Sample elements for large tensors
                indices = torch.randint(0, orig_tensor.numel(), (1000,))
                orig_sample = orig_tensor.view(-1)[indices]
                conv_sample = conv_tensor.view(-1)[indices]
                diff = torch.abs(orig_sample - conv_sample).max().item()
            else:
                diff = torch.abs(orig_tensor - conv_tensor).max().item()
            
            max_diff = max(max_diff, diff)
            
            # Consider a significant difference as a mismatch (adjust threshold as needed)
            if diff > 1e-3:
                value_mismatches.append((key, diff))
        except Exception as e:
            print(f"Error comparing values for key {key}: {e}")
    
    print(f"\nShape mismatches: {len(shape_mismatches)}")
    if shape_mismatches:
        print("Sample of shape mismatches:")
        for key, orig_shape, conv_shape in shape_mismatches[:10]:
            print(f"  {key}: original {orig_shape} vs converted {conv_shape}")
    
    print(f"\nValue mismatches: {len(value_mismatches)}")
    if value_mismatches:
        print("Sample of value mismatches:")
        for key, diff in sorted(value_mismatches[:10], key=lambda x: x[1], reverse=True):
            print(f"  {key}: max difference = {diff}")
    
    print(f"\nOverall maximum difference in tensor values: {max_diff}")
    
    if len(shape_mismatches) == 0 and len(value_mismatches) == 0 and len(missing_in_converted) == 0:
        print("\nCONVERSION CHECK PASSED: Model conversion appears to be successful!")
    else:
        print("\nCONVERSION CHECK FAILED: There are differences between the original and converted models.")


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--input_dir", default="OpenGVLab/InternVL3-2B-hf", 
                      help="Location of HF format InternVL model")
    parser.add_argument("--output_dir", default="InternVL3-2B-original",
                      help="Location to write original format model")
    parser.add_argument("--lm_type", default=None, choices=["llama", "qwen2"],
                      help="Language model type (llama or qwen2), will be auto-detected if not specified")
    args = parser.parse_args()
    
    # If lm_type was manually specified, override auto-detection
    if args.lm_type is not None:
        print(f"Using manually specified language model type: {args.lm_type}")
        lm_type = args.lm_type
    else:
        lm_type = None
    
    reverse_convert_model(args.input_dir, args.output_dir)

    # unitest
    # check_model_conversion("OpenGVLab/InternVL3-2B", args.output_dir)


if __name__ == "__main__":
    main()

@piamo @zhaomeng1234456 @FloSophorae

May 21 '25 17:05 Kuangdd01

我用 llamafactory sft 的 internvl3-hf 模型，vllm server 启动时报的另一个错：AttributeError: 'InternVLConfig' object has no attribute 'vocab_size'

部分上下文如下：

...
  File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_worker.py", line 162, in load_model
    self.model_runner.load_model()
  File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1332, in load_model
    self.model = get_model(vllm_config=self.vllm_config)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
    return loader.load_model(vllm_config=vllm_config)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 452, in load_model
    model = _initialize_model(vllm_config=vllm_config)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
    return model_class(vllm_config=vllm_config, prefix=prefix)
  File "/usr/local/lib/python3.9/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
    old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 377, in __init__
    self.model = TransformersModel(vllm_config=vllm_config, prefix=prefix)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 157, in __init__
    config.vocab_size,
  File "/usr/local/lib/python3.9/dist-packages/transformers/configuration_utils.py", line 211, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'InternVLConfig' object has no attribute 'vocab_size'
...

@Kuangdd01 我已经转换成chat模型，还是有这样的问题，使用vllm serve

May 27 '25 05:05 qinb

我用 llamafactory sft 的 internvl3-hf 模型，vllm server 启动时报的另一个错：AttributeError: 'InternVLConfig' object has no attribute 'vocab_size' 部分上下文如下：

...
  File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_worker.py", line 162, in load_model
    self.model_runner.load_model()
  File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1332, in load_model
    self.model = get_model(vllm_config=self.vllm_config)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
    return loader.load_model(vllm_config=vllm_config)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 452, in load_model
    model = _initialize_model(vllm_config=vllm_config)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
    return model_class(vllm_config=vllm_config, prefix=prefix)
  File "/usr/local/lib/python3.9/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
    old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 377, in __init__
    self.model = TransformersModel(vllm_config=vllm_config, prefix=prefix)
  File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/transformers.py", line 157, in __init__
    config.vocab_size,
  File "/usr/local/lib/python3.9/dist-packages/transformers/configuration_utils.py", line 211, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'InternVLConfig' object has no attribute 'vocab_size'
...

@Kuangdd01 我已经转换成chat模型，还是有这样的问题，使用vllm serve

用的原来chat模型的config.json

May 27 '25 06:05 Kuangdd01

@Kuangdd01

Hi, first of all, thank you so much for providing the fine-tuning code for InternVL3. I really appreciate your work and contribution to the open-source community.

I have fine-tuned the InternVL3-8B model using LoRA via LLaMA-Factory. Now, I would like to use the resulting LoRA-adapted weights for inference with vLLM. However, I’m not sure how to properly load or integrate these weights into a vLLM-based inference pipeline.

Could you kindly advise me on how to use a LoRA-tuned InternVL3-8B (trained via LLaMA-Factory) with vLLM?

Thank you in advance for your support!

May 29 '25 02:05 hongshi97

@Kuangdd01

Hi, first of all, thank you so much for providing the fine-tuning code for InternVL3. I really appreciate your work and contribution to the open-source community.

I have fine-tuned the InternVL3-8B model using LoRA via LLaMA-Factory. Now, I would like to use the resulting LoRA-adapted weights for inference with vLLM. However, I’m not sure how to properly load or integrate these weights into a vLLM-based inference pipeline.

Could you kindly advise me on how to use a LoRA-tuned InternVL3-8B (trained via LLaMA-Factory) with vLLM?

Thank you in advance for your support!

First, export your fine-tuned model after merging LoRA adapter. Second, try the above scripts to convert the -hf model to the -chat model, which vllm supports. Then, replace the safetensors with the converted ones in OpenGVLab/InternVL3-8B, we only want to re-use its config. Finally, vllm serve <your_path>.

If you add more (special) tokens during training, please carefully replace the following configs with the configs in your adapter direction.

May 29 '25 02:05 Kuangdd01

First of all, thank you very much for the fast and helpful response.

I followed your advice and successfully completed the conversion using the following command:

Step 1: convert the -hf model to the -chat model

python convert_hf_to_chat.py --input_dir /data/onout/martin/MODELS/internvl3_8b/lora/merged_1596 --output_dir /data/onout/martin/MODELS/internvl3_8b/lora/merged_1596_chat

Step 2: replace the safetensor (I think here is the problem with my case,,,)

mv /data/onout/martin/MODELS/internvl3_8b/lora/merged_1596_chat/model.safetensors /data/onout/martin/MODELS/internvl3_8b/lora/sft/checkpoint-1596/

Step 3: using vLLM

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

import random
import glob 
from vllm import LLM, SamplingParams
from PIL import Image

adapter_path = '/data/onout/martin/MODELS/internvl3_8b/lora/sft/checkpoint-1596'

llm = LLM(
    # model="OpenGVLab/InternVL3-8B-hf",
    model="OpenGVLab/InternVL3-8B",
    trust_remote_code=True,
    enable_lora=True
)

sampling_params = SamplingParams(
    temperature=0.7,
    max_tokens=1024
)

img_file_list = glob.glob('path/to/img_dir/*.jpg')
img_file_path = random.choice(img_file_list)
image = Image.open(img_file_path)

instruction1 = "Hi"

inputs = {
    "prompt": instruction1,
    "multi_modal_data": {"image": image}
}

lora_request = {
    "lora_name": "internvl3_lora",
    "lora_path": adapter_path
}

outputs = llm.generate([inputs], sampling_params, lora_request=lora_request)

result = outputs[0].outputs[0].text
print(result)

Despite these steps, I still encounter an error when executing the script above.

Do you have any suggestions for what might be going wrong? Thank you again for your support! 🙏

May 29 '25 03:05 hongshi97

Now we do not support using this script to convert the lora adapter, we should merge LoRA adapter to the HF model then convert the whole checkpoint. Indeed, we need an extra LoRA converting script for internvl...

May 29 '25 03:05 Kuangdd01

As you suggested, I first exported my fine-tuned model after merging the LoRA adapter using LLaMA Factory. After that, I proceeded with the steps I mentioned earlier (Step 1–3). I apologize for any confusion I may have caused.

Considering this, could you kindly help me once again to identify a possible solution? (I suspect I may have made a mistake when replacing the safetensor file.)

May 29 '25 03:05 hongshi97

mv /data/onout/martin/MODELS/internvl3_8b/lora/merged_1596_chat/model.safetensors /data/onout/martin/MODELS/internvl3_8b/lora/sft/checkpoint-1596/

You should move these safetensors to a local dir that contains the configs that come from https://huggingface.co/OpenGVLab/InternVL3-8B. Then vllm python file should be

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

import random
import glob 
from vllm import LLM, SamplingParams
from PIL import Image

# adapter_path = '/data/onout/martin/MODELS/internvl3_8b/lora/sft/checkpoint-1596'

llm = LLM(
    model="OpenGVLab/InternVL3-8B_after_replacing",
    trust_remote_code=True,
)

sampling_params = SamplingParams(
    temperature=0.7,
    max_tokens=1024
)

img_file_list = glob.glob('path/to/img_dir/*.jpg')
img_file_path = random.choice(img_file_list)
image = Image.open(img_file_path)

instruction1 = "Hi"

inputs = {
    "prompt": instruction1,
    "multi_modal_data": {"image": image}
}


outputs = llm.generate([inputs], sampling_params)

result = outputs[0].outputs[0].text
print(result)

May 29 '25 03:05 Kuangdd01

Thank you so much for your helpful comment and guidance.

In addition to what you mentioned, I also found that deleting the model.safetensors.index.json file from the "OpenGVLab/InternVL3-8B_after_replacing" weight directory was necessary for vLLM to run properly. After removing this file, everything worked as expected.

Thanks again for your support!

May 30 '25 01:05 hongshi97

@Kuangdd01

能详细说明一下怎么替换这5个json文件吗，我训练加了额外的tokens，当我把全量微调后的checkpoint里的json替换原始chat里的json，然后使用官方vllm推理时会效果变差，用llamafactory的huggingface框架 API推理效果是正常的。

Jun 12 '25 08:06 blofn

@Kuangdd01

能详细说明一下怎么替换这5个json文件吗，我训练加了额外的tokens，当我把全量微调后的checkpoint里的json替换原始chat里的json，然后使用官方vllm推理时会效果变差，用llamafactory的huggingface框架 API推理效果是正常的。

把下面四个换一下就行了吧

Jun 12 '25 12:06 Kuangdd01

您好我想用Intern3-VL-8B进行推理但是报错无法识别mmlm模型然后我照着您的回答https://github.com/hiyouga/LLaMA-Factory/issues/8086#issuecomment-2898640569 进行修改时遇到了下面的问题感谢您的关注和回复！我把https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main。上的代码和权重下载到了InternVL3-8B文件夹然后运行指令 python scripts/convert_ckpt/intern3-vl-8b.py --input_dir InternVL3-8B --output_dir saves/internvl3-8b-chat 为什么报错 Traceback (most recent call last): File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 468, in main() File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 461, in main reverse_convert_model(args.input_dir, args.output_dir) File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 209, in reverse_convert_model hf_model = InternVLForConditionalGeneration.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 309, in _wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4574, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5031, in _load_pretrained_model disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in _load_state_dict_into_meta_model _load_parameter_into_model(model, param_name, param.to(param_device)) File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 731, in _load_parameter_into_model module.load_state_dict({param_type: tensor}, strict=False, assign=True) File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 2584, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).

Jun 13 '25 13:06 WjzZwd

@Kuangdd01 我试了一下用hugging face里提供的模版推理保存的checkpoint： from transformers import AutoProcessor, AutoModelForImageTextToText import torch

torch_device = "cuda" model_checkpoint = "OpenGVLab/InternVL3-1B-hf" processor = AutoProcessor.from_pretrained(model_checkpoint) model = AutoModelForImageTextToText.from_pretrained(model_checkpoint, device_map=torch_device, torch_dtype=torch.bfloat16)

messages = [ { "role": "user", "content": [ {"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"}, {"type": "text", "text": "Please describe the image explicitly."}, ], } ]

inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)

generate_ids = model.generate(**inputs, max_new_tokens=50) decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)

decoded_output

和用API_PORT=8000 CUDA_VISIBLE_DEVICES=0 llamafactory-cli api的方式来推理保存的checkpoint。

两种方式推理的差异很大，api推理的结果是对的。

Jun 16 '25 03:06 blofn

您好我想用Intern3-VL-8B进行推理但是报错无法识别mmlm模型然后我照着您的回答#8086 (comment) 进行修改时遇到了下面的问题感谢您的关注和回复！我把https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main。上的代码和权重下载到了InternVL3-8B文件夹然后运行指令 python scripts/convert_ckpt/intern3-vl-8b.py --input_dir InternVL3-8B --output_dir saves/internvl3-8b-chat 为什么报错 Traceback (most recent call last): File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 468, in main() File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 461, in main reverse_convert_model(args.input_dir, args.output_dir) File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 209, in reverse_convert_model hf_model = InternVLForConditionalGeneration.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 309, in _wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4574, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5031, in _load_pretrained_model disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in _load_state_dict_into_meta_model _load_parameter_into_model(model, param_name, param.to(param_device)) File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 731, in _load_parameter_into_model module.load_state_dict({param_type: tensor}, strict=False, assign=True) File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 2584, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).

@Kuangdd01

Jun 16 '25 03:06 WjzZwd

@Kuangdd01 我试了一下用hugging face里提供的模版推理保存的checkpoint： from transformers import AutoProcessor, AutoModelForImageTextToText import torch

torch_device = "cuda" model_checkpoint = "OpenGVLab/InternVL3-1B-hf" processor = AutoProcessor.from_pretrained(model_checkpoint) model = AutoModelForImageTextToText.from_pretrained(model_checkpoint, device_map=torch_device, torch_dtype=torch.bfloat16)

messages = [ { "role": "user", "content": [ {"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"}, {"type": "text", "text": "Please describe the image explicitly."}, ], } ]

inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)

generate_ids = model.generate(**inputs, max_new_tokens=50) decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)

decoded_output

和用API_PORT=8000 CUDA_VISIBLE_DEVICES=0 llamafactory-cli api的方式来推理保存的checkpoint。

两种方式推理的差异很大，api推理的结果是对的。

可能是这个问题 https://github.com/hiyouga/LLaMA-Factory/issues/8136

Jun 16 '25 03:06 Kuangdd01

您好我想用Intern3-VL-8B进行推理但是报错无法识别mmlm模型然后我照着您的回答#8086 (comment) 进行修改时遇到了下面的问题感谢您的关注和回复！我把https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main。上的代码和权重下载到了InternVL3-8B文件夹然后运行指令 python scripts/convert_ckpt/intern3-vl-8b.py --input_dir InternVL3-8B --output_dir saves/internvl3-8b-chat 为什么报错 Traceback (most recent call last): File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 468, in main() File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 461, in main reverse_convert_model(args.input_dir, args.output_dir) File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 209, in reverse_convert_model hf_model = InternVLForConditionalGeneration.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 309, in _wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4574, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5031, in _load_pretrained_model disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in _load_state_dict_into_meta_model _load_parameter_into_model(model, param_name, param.to(param_device)) File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 731, in _load_parameter_into_model module.load_state_dict({param_type: tensor}, strict=False, assign=True) File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 2584, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).

@Kuangdd01

看上去是对tokenizer 增加了新的token，lm_head和原来的index.json对不上了

Jun 16 '25 03:06 Kuangdd01

您好我想用Intern3-VL-8B进行推理但是报错无法识别mmlm模型然后我照着您的回答#8086 (comment) 进行修改时遇到了下面的问题感谢您的关注和回复！我把https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main。上的代码和权重下载到了InternVL3-8B文件夹然后运行指令 python scripts/convert_ckpt/intern3-vl-8b.py --input_dir InternVL3-8B --output_dir saves/internvl3-8b-chat 为什么报错 Traceback (most recent call last): File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 468, in main() File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 461, in main reverse_convert_model(args.input_dir, args.output_dir) File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 209, in reverse_convert_model hf_model = InternVLForConditionalGeneration.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 309, in _wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4574, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5031, in _load_pretrained_model disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in _load_state_dict_into_meta_model _load_parameter_into_model(model, param_name, param.to(param_device)) File "/home/tiger/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 731, in _load_parameter_into_model module.load_state_dict({param_type: tensor}, strict=False, assign=True) File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 2584, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for Embedding: size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).

@Kuangdd01

看上去是对tokenizer 增加了新的token，lm_head和原来的index.json对不上了

了解感谢

Jun 16 '25 07:06 WjzZwd

here maybe wrong? @Kuangdd01

import re

# ...
if "qkv" in old_key:
    # 只替换中间的 `.layer.`，不会误替换掉 `layers`
    safe_key = re.sub(r"\.layer\.", ".layers.", old_key)
    final_state_dict[safe_key] = tensor.detach().clone()

Jun 23 '25 04:06 WjzZwd

这里应该是没问题的，你可以用check_model_conversion()检查一下

Jun 23 '25 06:06 Kuangdd01

我现在git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 而不是 https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main

之前

RuntimeError: Error(s) in loading state_dict for Embedding:
        size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).

的问题我通过

    hf_model = AutoModelForCausalLM.from_pretrained(
        input_path, 
        config=config,  # ✅ 加上这行
        trust_remote_code=True,  # ✅ 不然加载不了 InternVL
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        device_map='auto'
    )

解决，但现在我在运行指令 DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl 之后，报错

[rank0]: multiprocess.pool.RemoteTraceback: 
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]:     result = (True, func(*args, **kwds))
[rank0]:                     ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue
[rank0]:     for i, result in enumerate(func(**kwargs)):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single
[rank0]:     for i, batch in iter_outputs(shard_iterable):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs
[rank0]:     yield i, apply_function(example, i, offset=offset)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function
[rank0]:     processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset
[rank0]:     input_ids, labels = self._encode_data_example(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example
[rank0]:     messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """

我的的moel_name_or_path目录下的存在从git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 拉下来的配置文件preprocessor_config.json 内容是

{
    "crop_size": 448,
    "do_center_crop": true,
    "do_normalize": true,
    "do_resize": true,
    "feature_extractor_type": "CLIPFeatureExtractor",
    "image_mean": [
      0.485,
      0.456,
      0.406
    ],
    "image_std": [
      0.229,
      0.224,
      0.225
    ],
    "resample": 3,
    "size": 448
  }

真诚请教您是否对这个错误有处理思路，因为我需要使用vllm对图文数据集进行推理 vllm serve的方式对我不太友好，因为我的机器屏蔽了所有外部端口感谢回复 @Kuangdd01

Jun 23 '25 08:06 WjzZwd

我现在git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 而不是 https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main

之前

RuntimeError: Error(s) in loading state_dict for Embedding:
        size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).

的问题我通过

    hf_model = AutoModelForCausalLM.from_pretrained(
        input_path, 
        config=config,  # ✅ 加上这行
        trust_remote_code=True,  # ✅ 不然加载不了 InternVL
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        device_map='auto'
    )

解决，但现在我在运行指令 DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl 之后，报错

[rank0]: multiprocess.pool.RemoteTraceback: 
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]:     result = (True, func(*args, **kwds))
[rank0]:                     ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue
[rank0]:     for i, result in enumerate(func(**kwargs)):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single
[rank0]:     for i, batch in iter_outputs(shard_iterable):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs
[rank0]:     yield i, apply_function(example, i, offset=offset)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function
[rank0]:     processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset
[rank0]:     input_ids, labels = self._encode_data_example(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example
[rank0]:     messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """

我的的moel_name_or_path目录下的存在从git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 拉下来的配置文件preprocessor_config.json 内容是

{
    "crop_size": 448,
    "do_center_crop": true,
    "do_normalize": true,
    "do_resize": true,
    "feature_extractor_type": "CLIPFeatureExtractor",
    "image_mean": [
      0.485,
      0.456,
      0.406
    ],
    "image_std": [
      0.229,
      0.224,
      0.225
    ],
    "resample": 3,
    "size": 448
  }

真诚请教您是否对这个错误有处理思路，因为我需要使用vllm对图文数据集进行推理 vllm serve的方式我想看你对我不太友好，因为我的机器屏蔽了所有外部端口感谢回复 @Kuangdd01

转换后的目录processor_config.json不能被llamafactory识别手动hack一下让载入的tokenizer/processor来自于intern3vl-hf版本

# hack for internvl-hf processor
# tokenizer_module = load_tokenizer(model_args) =>
tokenizer_module = load_tokenizer("local-internvl3-hf-dir")

Jun 23 '25 10:06 Kuangdd01

我现在git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 而不是 https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main 之前

RuntimeError: Error(s) in loading state_dict for Embedding:
        size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).

的问题我通过

    hf_model = AutoModelForCausalLM.from_pretrained(
        input_path, 
        config=config,  # ✅ 加上这行
        trust_remote_code=True,  # ✅ 不然加载不了 InternVL
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        device_map='auto'
    )

解决，但现在我在运行指令 DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl 之后，报错

[rank0]: multiprocess.pool.RemoteTraceback: 
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]:     result = (True, func(*args, **kwds))
[rank0]:                     ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue
[rank0]:     for i, result in enumerate(func(**kwargs)):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single
[rank0]:     for i, batch in iter_outputs(shard_iterable):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs
[rank0]:     yield i, apply_function(example, i, offset=offset)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function
[rank0]:     processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset
[rank0]:     input_ids, labels = self._encode_data_example(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example
[rank0]:     messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """

我的的moel_name_or_path目录下的存在从git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 拉下来的配置文件preprocessor_config.json 内容是

{
    "crop_size": 448,
    "do_center_crop": true,
    "do_normalize": true,
    "do_resize": true,
    "feature_extractor_type": "CLIPFeatureExtractor",
    "image_mean": [
      0.485,
      0.456,
      0.406
    ],
    "image_std": [
      0.229,
      0.224,
      0.225
    ],
    "resample": 3,
    "size": 448
  }

真诚请教您是否对这个错误有处理思路，因为我需要使用vllm对图文数据集进行推理 vllm serve的方式我想看你对我不太友好，因为我的机器屏蔽了所有外部端口感谢回复 @Kuangdd01

转换后的目录processor_config.json不能被llamafactory识别手动hack一下让载入的tokenizer/processor来自于intern3vl-hf版本

hack for internvl-hf processor

tokenizer_module = load_tokenizer(model_args) =>

tokenizer_module = load_tokenizer("local-internvl3-hf-dir")

您的意思是说从 https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main 中去加载config而不是 https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 吗

Jun 24 '25 03:06 WjzZwd

我现在git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 而不是 https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main 之前

RuntimeError: Error(s) in loading state_dict for Embedding:
        size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).

的问题我通过

    hf_model = AutoModelForCausalLM.from_pretrained(
        input_path, 
        config=config,  # ✅ 加上这行
        trust_remote_code=True,  # ✅ 不然加载不了 InternVL
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        device_map='auto'
    )

解决，但现在我在运行指令 DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl 之后，报错

[rank0]: multiprocess.pool.RemoteTraceback: 
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]:     result = (True, func(*args, **kwds))
[rank0]:                     ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue
[rank0]:     for i, result in enumerate(func(**kwargs)):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single
[rank0]:     for i, batch in iter_outputs(shard_iterable):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs
[rank0]:     yield i, apply_function(example, i, offset=offset)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function
[rank0]:     processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset
[rank0]:     input_ids, labels = self._encode_data_example(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example
[rank0]:     messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """

我的的moel_name_or_path目录下的存在从git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 拉下来的配置文件preprocessor_config.json 内容是

{
    "crop_size": 448,
    "do_center_crop": true,
    "do_normalize": true,
    "do_resize": true,
    "feature_extractor_type": "CLIPFeatureExtractor",
    "image_mean": [
      0.485,
      0.456,
      0.406
    ],
    "image_std": [
      0.229,
      0.224,
      0.225
    ],
    "resample": 3,
    "size": 448
  }

真诚请教您是否对这个错误有处理思路，因为我需要使用vllm对图文数据集进行推理 vllm serve的方式我想看你对我不太友好，因为我的机器屏蔽了所有外部端口感谢回复 @Kuangdd01

转换后的目录processor_config.json不能被llamafactory识别手动hack一下让载入的tokenizer/processor来自于intern3vl-hf版本

hack for internvl-hf processor

tokenizer_module = load_tokenizer(model_args) =>

tokenizer_module = load_tokenizer("local-internvl3-hf-dir")

我把OpenGVLab/InternVL3-8B-hf git clone到了本地再把转换后的model.safetensors移动到 OpenGVLab/InternVL3-8B-hf文件夹下，删除了原本的model碎片以及碎片index.json 但是运行 DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B-hf --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl 之后就会报错

  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 366, in __init__
    self.multimodal_config = self._init_multimodal_config(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 431, in _init_multimodal_config
    raise ValueError("`limit_mm_per_prompt` is only supported for "
ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

似乎还不如我第一种方法走的远hhh😭

Jun 24 '25 03:06 WjzZwd

我现在git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 而不是 https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main 之前

RuntimeError: Error(s) in loading state_dict for Embedding:
        size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).

的问题我通过

    hf_model = AutoModelForCausalLM.from_pretrained(
        input_path, 
        config=config,  # ✅ 加上这行
        trust_remote_code=True,  # ✅ 不然加载不了 InternVL
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        device_map='auto'
    )

解决，但现在我在运行指令 DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl 之后，报错

[rank0]: multiprocess.pool.RemoteTraceback: 
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]:     result = (True, func(*args, **kwds))
[rank0]:                     ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue
[rank0]:     for i, result in enumerate(func(**kwargs)):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single
[rank0]:     for i, batch in iter_outputs(shard_iterable):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs
[rank0]:     yield i, apply_function(example, i, offset=offset)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function
[rank0]:     processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset
[rank0]:     input_ids, labels = self._encode_data_example(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example
[rank0]:     messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """

我的的moel_name_or_path目录下的存在从git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 拉下来的配置文件preprocessor_config.json 内容是

{
    "crop_size": 448,
    "do_center_crop": true,
    "do_normalize": true,
    "do_resize": true,
    "feature_extractor_type": "CLIPFeatureExtractor",
    "image_mean": [
      0.485,
      0.456,
      0.406
    ],
    "image_std": [
      0.229,
      0.224,
      0.225
    ],
    "resample": 3,
    "size": 448
  }

真诚请教您是否对这个错误有处理思路，因为我需要使用vllm对图文数据集进行推理 vllm serve的方式对我不太友好，因为我的机器屏蔽了所有外部端口感谢回复 @Kuangdd01

转换后的目录processor_config.json不能被llamafactory识别手动hack一下让载入的tokenizer/processor来自于intern3vl-hf版本

hack for internvl-hf processor

tokenizer_module = load_tokenizer(model_args) =>

tokenizer_module = load_tokenizer("local-internvl3-hf-dir")

sorry我觉得我的表达有些混乱让我重新梳理一下我利用您的代码分别转换了InternVL3-8B和InternVL-8B-hf 对于InternVL3-8B在转换后，我遇到了process找不到的问题，

[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.

感觉应该是vllm无法识别这种非hf的config吧而对于InternVL-8B-hf 我在转换后，应用了之前的hf的config 但是仍旧报错

  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 366, in __init__
    self.multimodal_config = self._init_multimodal_config(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 431, in _init_multimodal_config
    raise ValueError("`limit_mm_per_prompt` is only supported for "
ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

😭 @Kuangdd01

Jun 24 '25 05:06 WjzZwd

我现在git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 而不是 https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main 之前

RuntimeError: Error(s) in loading state_dict for Embedding:
        size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).

的问题我通过

    hf_model = AutoModelForCausalLM.from_pretrained(
        input_path, 
        config=config,  # ✅ 加上这行
        trust_remote_code=True,  # ✅ 不然加载不了 InternVL
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        device_map='auto'
    )

解决，但现在我在运行指令 DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl 之后，报错

[rank0]: multiprocess.pool.RemoteTraceback: 
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]:     result = (True, func(*args, **kwds))
[rank0]:                     ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue
[rank0]:     for i, result in enumerate(func(**kwargs)):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single
[rank0]:     for i, batch in iter_outputs(shard_iterable):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs
[rank0]:     yield i, apply_function(example, i, offset=offset)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function
[rank0]:     processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset
[rank0]:     input_ids, labels = self._encode_data_example(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example
[rank0]:     messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """

我的的moel_name_or_path目录下的存在从git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 拉下来的配置文件preprocessor_config.json 内容是

{
    "crop_size": 448,
    "do_center_crop": true,
    "do_normalize": true,
    "do_resize": true,
    "feature_extractor_type": "CLIPFeatureExtractor",
    "image_mean": [
      0.485,
      0.456,
      0.406
    ],
    "image_std": [
      0.229,
      0.224,
      0.225
    ],
    "resample": 3,
    "size": 448
  }

真诚请教您是否对这个错误有处理思路，因为我需要使用vllm对图文数据集进行推理 vllm serve的方式对我不太友好，因为我的机器屏蔽了所有外部端口感谢回复 @Kuangdd01

转换后的目录processor_config.json不能被llamafactory识别手动hack一下让载入的tokenizer/processor来自于intern3vl-hf版本

hack for internvl-hf processor

tokenizer_module = load_tokenizer(model_args) =>

tokenizer_module = load_tokenizer("local-internvl3-hf-dir")

sorry我觉得我的表达有些混乱让我重新梳理一下我利用您的代码分别转换了InternVL3-8B和InternVL-8B-hf 对于InternVL3-8B在转换后，我遇到了process找不到的问题，

[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.

感觉应该是vllm无法识别这种非hf的config吧而对于InternVL-8B-hf 我在转换后，应用了之前的hf的config 但是仍旧报错

  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 366, in __init__
    self.multimodal_config = self._init_multimodal_config(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 431, in _init_multimodal_config
    raise ValueError("`limit_mm_per_prompt` is only supported for "
ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

😭 @Kuangdd01

llamafactory intern_vl template依赖的是-hf的tokenizer和processor, 所以你在vllm_infer这个脚本的时候会显示processor找不到的问题，vllm load的模型继续保持转换户的权重+非-hf版本的config就行，需要针对性修改的是下面这里的逻辑

 # 这里你需要让脚本去load -hf版本的config[tokenizer_config.json, processor.config...]
tokenizer_module = load_tokenizer(model_args) # 这里load的路径只能是internvl3-hf version
tokenizer = tokenizer_module["tokenizer"]

diff就是processor你得加载转换前的，模型你得加载转换后的

Jun 24 '25 06:06 Kuangdd01

我现在git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 而不是 https://huggingface.co/OpenGVLab/InternVL3-8B-hf/tree/main 之前
RuntimeError: Error(s) in loading state_dict for Embedding:
        size mismatch for weight: copying a param with shape torch.Size([151674, 3584]) from checkpoint, the shape in current model is torch.Size([151936, 4096]).
的问题我通过
    hf_model = AutoModelForCausalLM.from_pretrained(
        input_path, 
        config=config,  # ✅ 加上这行
        trust_remote_code=True,  # ✅ 不然加载不了 InternVL
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        device_map='auto'
    )
解决，但现在我在运行指令 DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/vllm_infer_intern.py --model_name_or_path /code/LLaMA-Factory/ckpts/InternVL3-8B --template intern_vl --dataset tt_img_text_ndcg_test1_allin_score --save_name generated_predicitons_test1_zs_allin_score_inter.jsonl 之后，报错
[rank0]: multiprocess.pool.RemoteTraceback: 
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]:     result = (True, func(*args, **kwds))
[rank0]:                     ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue
[rank0]:     for i, result in enumerate(func(**kwargs)):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3525, in _map_single
[rank0]:     for i, batch in iter_outputs(shard_iterable):
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs
[rank0]:     yield i, apply_function(example, i, offset=offset)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/tiger/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function
[rank0]:     processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 68, in preprocess_dataset
[rank0]:     input_ids, labels = self._encode_data_example(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/processor/unsupervised.py", line 46, in _encode_data_example
[rank0]:     messages = self.template.mm_plugin.process_messages(messages, images, videos, audios, self.processor)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """
我的的moel_name_or_path目录下的存在从git clone https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main 拉下来的配置文件preprocessor_config.json 内容是
{
    "crop_size": 448,
    "do_center_crop": true,
    "do_normalize": true,
    "do_resize": true,
    "feature_extractor_type": "CLIPFeatureExtractor",
    "image_mean": [
      0.485,
      0.456,
      0.406
    ],
    "image_std": [
      0.229,
      0.224,
      0.225
    ],
    "resample": 3,
    "size": 448
  }
  
真诚请教您是否对这个错误有处理思路，因为我需要使用vllm对图文数据集进行推理 vllm serve的方式对我不太友好，因为我的机器屏蔽了所有外部端口感谢回复 @Kuangdd01
转换后的目录processor_config.json不能被llamafactory识别手动hack一下让载入的tokenizer/processor来自于intern3vl-hf版本

hack for internvl-hf processor

tokenizer_module = load_tokenizer(model_args) =>

tokenizer_module = load_tokenizer("local-internvl3-hf-dir")
sorry我觉得我的表达有些混乱让我重新梳理一下我利用您的代码分别转换了InternVL3-8B和InternVL-8B-hf 对于InternVL3-8B在转换后，我遇到了process找不到的问题，
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 605, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
感觉应该是vllm无法识别这种非hf的config吧而对于InternVL-8B-hf 我在转换后，应用了之前的hf的config 但是仍旧报错
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 366, in __init__
    self.multimodal_config = self._init_multimodal_config(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 431, in _init_multimodal_config
    raise ValueError("`limit_mm_per_prompt` is only supported for "
ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
😭 @Kuangdd01
llamafactory intern_vl template依赖的是-hf的tokenizer和processor, 所以你在vllm_infer这个脚本的时候会显示processor找不到的问题，vllm load的模型继续保持转换户的权重+非-hf版本的config就行，需要针对性修改的是下面这里的逻辑

这里你需要让脚本去load -hf版本的config[tokenizer_config.json, processor.config...]

tokenizer_module = load_tokenizer(model_args) # 这里load的路径只能是internvl3-hf version tokenizer = tokenizer_module["tokenizer"] diff就是processor你得加载转换前的，模型你得加载转换后的

是的我现在的权重是用您给的代码转换后的，config是从hf上直接clone下来的InternVL3-8B-hf 但是运行后仍然报错 @Kuangdd01

=== ModelConfig Parameters ===
model: /mnt/bn/search-nlp-us/zhaowending/code/LLaMA-Factory/ckpts/InternVL3-8B-hf
task: auto
tokenizer: /mnt/bn/search-nlp-us/zhaowending/code/LLaMA-Factory/ckpts/InternVL3-8B-hf
tokenizer_mode: auto
trust_remote_code: True
allowed_local_media_path: 
dtype: auto
seed: 0
revision: None
code_revision: None
rope_scaling: None
rope_theta: None
hf_overrides: None
tokenizer_revision: None
max_model_len: 4096
quantization: None
enforce_eager: None
max_seq_len_to_capture: 8192
max_logprobs: 20
disable_sliding_window: False
skip_tokenizer_init: False
served_model_name: None
limit_mm_per_prompt: {'image': 10, 'video': 2, 'audio': 2}
use_async_output_proc: True
config_format: ConfigFormat.AUTO
mm_processor_kwargs: None
disable_mm_preprocessor_cache: False
override_neuron_config: None
override_pooler_config: None
logits_processor_pattern: None
generation_config: None
override_generation_config: None
enable_sleep_mode: False
model_impl: auto
[INFO|configuration_utils.py:710] 2025-06-24 06:19:13,862 >> loading configuration file /code/LLaMA-Factory/ckpts/InternVL3-8B-hf/config.json
[INFO|configuration_utils.py:710] 2025-06-24 06:19:13,863 >> loading configuration file /code/LLaMA-Factory/ckpts/InternVL3-8B-hf/config.json
[INFO|configuration_utils.py:775] 2025-06-24 06:19:13,865 >> Model config InternVLConfig {
  "architectures": [
    "InternVLForConditionalGeneration"
  ],
  "downsample_ratio": 0.5,
  "image_seq_length": 256,
  "image_token_id": 151667,
  "model_type": "internvl",
  "projector_hidden_act": "gelu",
  "text_config": {
    "architectures": [
      "Qwen2ForCausalLM"
    ],
    "attention_dropout": 0.0,
    "bos_token_id": 151643,
    "eos_token_id": 151645,
    "hidden_act": "silu",
    "hidden_size": 3584,
    "initializer_range": 0.02,
    "intermediate_size": 18944,
    "layer_types": [
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention",
      "full_attention"
    ],
    "max_position_embeddings": 32768,
    "max_window_layers": 70,
    "model_type": "qwen2",
    "num_attention_heads": 28,
    "num_hidden_layers": 28,
    "num_key_value_heads": 4,
    "rms_norm_eps": 1e-06,
    "rope_scaling": {
      "factor": 2.0,
      "rope_type": "dynamic",
      "type": "dynamic"
    },
    "rope_theta": 1000000.0,
    "sliding_window": null,
    "torch_dtype": "bfloat16",
    "use_cache": true,
    "use_sliding_window": false,
    "vocab_size": 151674
  },
  "torch_dtype": "bfloat16",
  "transformers_version": "4.53.0.dev0",
  "vision_config": {
    "architectures": [
      "InternVisionModel"
    ],
    "attention_bias": true,
    "attention_dropout": 0.0,
    "dropout": 0.0,
    "hidden_act": "gelu",
    "hidden_dropout_prob": 0.0,
    "hidden_size": 1024,
    "image_size": [
      448,
      448
    ],
    "initializer_factor": 0.1,
    "initializer_range": 1e-10,
    "intermediate_size": 4096,
    "layer_norm_eps": 1e-06,
    "layer_scale_init_value": 0.1,
    "model_type": "internvl_vision",
    "norm_type": "layer_norm",
    "num_attention_heads": 16,
    "num_channels": 3,
    "num_hidden_layers": 24,
    "patch_size": [
      14,
      14
    ],
    "projection_dropout": 0.0,
    "torch_dtype": "bfloat16",
    "use_absolute_position_embeddings": true,
    "use_mask_token": false,
    "use_mean_pooling": true,
    "use_qk_norm": false
  },
  "vision_feature_layer": -1,
  "vision_feature_select_strategy": "default"
}

Traceback (most recent call last):
  File "/code/LLaMA-Factory/scripts/vllm_infer_intern.py", line 211, in <module>
    fire.Fire(vllm_infer)
  File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/code/LLaMA-Factory/scripts/vllm_infer_intern.py", line 124, in vllm_infer
    llm = LLM(**engine_args)
          ^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/utils.py", line 1022, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 242, in __init__
    self.llm_engine = self.engine_class.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 486, in from_engine_args
    engine_config = engine_args.create_engine_config(usage_context)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1165, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1085, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 366, in __init__
    self.multimodal_config = self._init_multimodal_config(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tiger/.local/lib/python3.11/site-packages/vllm/config.py", line 431, in _init_multimodal_config
    raise ValueError("`limit_mm_per_prompt` is only supported for "
ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

Jun 24 '25 06:06 WjzZwd

I find why!!!! 把 /mnt/bn/search-nlp-us/zhaowending/code/LLaMA-Factory/src/llamafactory/model/loader.py中

processor = AutoProcessor.from_pretrained(model_args.model_name_or_path, **init_kwargs)

修改为

        init_kwargs["trust_remote_code"] = True
        processor = AutoProcessor.from_pretrained(
            'OpenGVLab/InternVL3-8B-hf',
            **init_kwargs
        )

### 但是！！！！重要的是！！！！！

我是把internvl3-8B而不是internvl3-8B-hf进行的转换，转换完成后，仍旧在internvl3-8B文件夹下完成后续操作。就可以使用scripts中的vllm进行推理了，此外，需要注意的是我把internvl3-8B的五个config文件替换为了internvl3-8B-hf的config文件，对一些版本&模型检查进行了屏蔽，如果需要，你可以发送你的报错，我们一起讨论

虽然碰到了一些bug 但我仍在调试感谢 @Kuangdd01 的耐心解答！

Jun 24 '25 06:06 WjzZwd

使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

Reminder

System Info

Reproduction

Others

Copyright 2025 the LlamaFactory team.

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

hack for internvl-hf processor

tokenizer_module = load_tokenizer(model_args) =>

hack for internvl-hf processor

tokenizer_module = load_tokenizer(model_args) =>

hack for internvl-hf processor

tokenizer_module = load_tokenizer(model_args) =>

hack for internvl-hf processor

tokenizer_module = load_tokenizer(model_args) =>

hack for internvl-hf processor

tokenizer_module = load_tokenizer(model_args) =>

这里你需要让脚本去load -hf版本的config[tokenizer_config.json, processor.config...]