[Bug] InternVLChatModel.batch_chat()中缺少设置template.system_message的操作

Open Andempathy opened this issue 1 year ago • 0 comments

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

在在InternVLChatModel.chat()中，可以通过model.system_message覆盖template.system_message： https://github.com/OpenGVLab/InternVL/blob/6a230b34cc04eb2ee51c3ea013362a57ab6a6dc9/internvl_chat/internvl/model/internvl_chat/modeling_internvl_chat.py#L316-L317

而在InternVLChatModel.batch_chat()中缺少这一部分： https://github.com/OpenGVLab/InternVL/blob/6a230b34cc04eb2ee51c3ea013362a57ab6a6dc9/internvl_chat/internvl/model/internvl_chat/modeling_internvl_chat.py#L277-L280

这使得对model.system_message和model.conv_template.system_message的修改在batch_chat的推理中实际上不生效，进而导致在包含system_message的sft数据上微调后的模型在chat和batch_chat时表现出比较明显的差异，而在batch_chat中手动添加template.system_message = self.system_message后，chat和batch_chat重新保持一致。

这一问题在https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/internvl/model/internvl_chat/modeling_internvl_chat.py 以及HF上的modeling_internvl_chat.py中都存在，望修复。

Reproduction

import torch
from transformers import AutoModel, AutoTokenizer

model = (
    AutoModel.from_pretrained(
        "my_finetuned_ckpt",
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        trust_remote_code=True,
    )
    .eval()
    .cuda()
)

model.system_message = "my_custom_system_message"
model.conv_template.system_message = "my_custom_system_message"

tokenizer = AutoTokenizer.from_pretrained("my_finetuned_ckpt", trust_remote_code=True)

pixel_values = [load_image(image, max_num=1) for image in images]
num_patches_list = [pixel_value.size(0) for pixel_value in pixel_values]
pixel_values = torch.cat(pixel_values, dim=0)

questions = ["my_custom_question"] * len(num_patches_list)

generation_config = dict(
    num_beams=1,
    max_new_tokens=1024,
    do_sample=False,
)

responses = model.batch_chat(
    tokenizer,
    pixel_values,
    num_patches_list=num_patches_list,
    questions=questions,
    generation_config=generation_config,
)

Environment

sys.platform: linux
Python: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3: NVIDIA GeForce RTX 4090
CUDA_HOME: /usr/local/cuda-12.1
NVCC: Cuda compilation tools, release 12.1, V12.1.105
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.3.1+cu121
...

Error traceback

No response

Aug 07 '24 10:08 Andempathy