ms-swift Best Practices for Inference and Fine-Tuning with MiniCPM-V 2.6

模型：https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6

通常，多模态大模型微调会使用自定义数据集进行微调。在这里，我们将展示可直接运行的demo。

在开始微调之前，请确保您的环境已准备妥当。

git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]

模型推理

CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6

<<< 你好
你好！今天我能为您提供什么帮助？
--------------------------------------------------
<<< clear
<<< <image>描述这张图片
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
这张图片展示了一只小猫的特写，它有着引人注目的外貌。小猫有着大大的、圆圆的、蓝色的眼睛，看起来充满了好奇和天真。它的毛色主要是白色，带有灰色和黑色的条纹，特别是在脸部和耳朵周围，这些地方的条纹更加明显。小猫的耳朵竖立着，尖尖的，内侧是粉红色的。它的胡须又长又白，从脸颊上伸出来。小猫的鼻子是粉红色的，嘴巴微微张开，露出一点粉红色的舌头。背景模糊，将焦点集中在小猫身上，暗示着一个室内环境，柔和的光线照亮了小猫的毛发。
--------------------------------------------------
<<< clear
<<< <video>描述这段视频
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
这段视频展示了一个年幼的孩子坐在床上，专心阅读一本书。孩子戴着深色眼镜，穿着浅蓝色无袖上衣和粉色裤子。床上铺着白色床单，孩子旁边放着一件白色衣物。背景中有一个木制婴儿床，暗示着一个家庭环境。房间光线柔和，氛围平静。视频中没有明显的动作或活动，孩子似乎完全沉浸在阅读中。

图片微调

我们使用 coco-en-mini 数据集进行微调，该数据集的任务是对图片内容进行描述。您可以在 modelscope 上找到该数据集：https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary

# 默认会将lora_target_modules设置为llm和resampler所有的linear
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --sft_type lora \
  --dataset coco-en-mini#20000 \
  --deepspeed default-zero2

如果要使用自定义数据集，只需按以下方式进行指定：

  --dataset train.jsonl \
  --val_dataset val.jsonl \

自定义数据集支持json和jsonl样式，以下是自定义数据集的样例：

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee<image>eeeee<image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "images": []}

显存占用：

微调后推理脚本如下：

# 如果要全量测试请设置: `--show_dataset_sample -1`
CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/minicpm-v-v2_6-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true

微调后模型对验证集进行推理的示例（时间原因，只训练了300个step）：

视频微调

我们使用 video-chatgpt 数据集进行微调，该数据集的任务是对视频内容进行描述。您可以在 modelscope 上找到该数据集：https://modelscope.cn/datasets/swift/VideoChatGPT

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --sft_type lora \
  --dataset video-chatgpt \
  --deepspeed default-zero2

自定义数据集支持json和jsonl样式，以下是自定义数据集的样例：

{"query": "<video>55555", "response": "66666", "videos": ["video_path"]}
{"query": "eeeee<video>eeeee<video>eeeee", "response": "fffff", "history": [], "videos": ["video_path1", "video_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "videos": []}

显存占用：

微调后推理脚本如下：

CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/minicpm-v-v2_6-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true

微调后模型对验证集进行推理的示例（时间原因，只训练了50个step）：

Aug 06 '24 14:08 Jintao-Huang

官方文档的多图理解和in-context有在swift api里支持吗？

Aug 07 '24 15:08 demoninpiano

支持多图和多轮的

多图需要使用多个标签即可. 可以查看上面的自定义数据集的格式

Aug 07 '24 15:08 Jintao-Huang

需要升级swift到什么版本啊？

Aug 08 '24 12:08 guihonghao

还在main分支

Aug 08 '24 12:08 Jintao-Huang

单样本视频推理的代码可以提供吗

Aug 11 '24 08:08 compleXuan

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = '<video>描述这段视频'
videos = ['https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4']
response, history = inference(model, template, query, videos=videos)
print(f'query: {query}')
print(f'response: {response}')

# 流式（streaming）
query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
"""
query: <video>描述这段视频
response: 这段视频展示了一个年幼的孩子，可能是一个蹒跚学步的幼儿，坐在床上专心阅读一本书。孩子戴着深色眼镜，穿着浅绿色无袖上衣和粉色裤子。床上铺着白色床单，背景中有一个木制婴儿床，暗示着一个家庭环境。房间光线充足，氛围温馨舒适。孩子专注的表情和姿势表明他们对书本内容很投入。
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写，它有着引人注目的面部特征。小猫的毛色主要是白色，带有灰色和黑色的条纹，特别是在眼睛周围和耳朵上。它的眼睛又大又圆，有着蓝色的虹膜，看起来非常好奇或专注。小猫的耳朵竖立着，内耳是粉红色的，与毛色形成对比。小猫的鼻子是粉红色的，有着小小的黑色鼻子，嘴巴微微张开，露出一点粉红色的舌头。小猫的胡须又长又白，从脸颊上伸出来。背景模糊，将焦点集中在小猫身上，暗示着一个室内环境，有自然光线，可能来自窗户。
"""

Aug 12 '24 02:08 Jintao-Huang

请问官方的few-shot推理方式 swift有支持么?

Aug 12 '24 02:08 okideal

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = '<video>描述这段视频'
videos = ['https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4']
response, history = inference(model, template, query, videos=videos)
print(f'query: {query}')
print(f'response: {response}')

# 流式（streaming）
query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
"""
query: <video>描述这段视频
response: 这段视频展示了一个年幼的孩子，可能是一个蹒跚学步的幼儿，坐在床上专心阅读一本书。孩子戴着深色眼镜，穿着浅绿色无袖上衣和粉色裤子。床上铺着白色床单，背景中有一个木制婴儿床，暗示着一个家庭环境。房间光线充足，氛围温馨舒适。孩子专注的表情和姿势表明他们对书本内容很投入。
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写，它有着引人注目的面部特征。小猫的毛色主要是白色，带有灰色和黑色的条纹，特别是在眼睛周围和耳朵上。它的眼睛又大又圆，有着蓝色的虹膜，看起来非常好奇或专注。小猫的耳朵竖立着，内耳是粉红色的，与毛色形成对比。小猫的鼻子是粉红色的，有着小小的黑色鼻子，嘴巴微微张开，露出一点粉红色的舌头。小猫的胡须又长又白，从脸颊上伸出来。背景模糊，将焦点集中在小猫身上，暗示着一个室内环境，有自然光线，可能来自窗户。
"""

is this included in documentation somewhere...

Aug 12 '24 02:08 yingdachen

is this included in documentation somewhere...

Thank you for the excellent suggestions. We will update the document within this week.

Aug 12 '24 03:08 Jintao-Huang

使用vllm：

pip install vllm>=0.5.4

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
generation_info = {}
request_list = [{'query': query, 'images': images} for _ in range(100)]  # batch推理的示例
resp_list = inference_vllm(vllm_engine, template, request_list, generation_info=generation_info, use_tqdm=True)
print(f'query: {query}')
print(f'response: {resp_list[0]["response"]}')
print(generation_info)

# 流式（streaming）
generation_info = {}
gen = inference_stream_vllm(vllm_engine, template, request_list, generation_info=generation_info)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
# only show first
for resp_list in gen:
    resp = resp_list[0]
    if resp is None:
        continue
    response = resp['response']
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
print(generation_info)
"""
100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 91.47it/s]
100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:22<00:00,  4.48it/s]
query: <image>描述这张图片
response: 这张图片展示了一只小猫咪的特写，可能是美国短毛猫品种，因为其花纹和毛发质地。猫咪有着引人注目的蓝色眼睛，这是其外貌中非常突出的特征。它皮毛上有着独特的黑色条纹，从面颊延伸至头顶，暗示着一种有条纹的花纹图案。它的耳朵小而尖，内侧是粉色的。猫咪的胡须细长而突出，围绕在它的下颌两侧和眼睛周围。猫咪坐着，用一种表达丰富的方式直视着，嘴巴微微张开，露出粉红色的内唇。背景模糊，柔和的光线增强了猫咪的特征。
{'num_prompt_tokens': 2700, 'num_generated_tokens': 14734, 'num_samples': 100, 'runtime': 23.53027338697575, 'samples/s': 4.249844375176322, 'tokens/s': 626.1720702384794}
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写，可能是一只幼年猫，在模糊的背景中，集中注意力在猫的表情上。这只猫长着一身白色与黑色条纹相间的毛皮，带有微妙的灰褐色。它的眼睛大而圆，具有高度的反光度，表明它们可能含有异色瞳，即一只眼睛是蓝色的，另一只是绿色的，但这只猫两只眼睛都是绿色的。睫毛清晰可见，增添了一种生动的表情。猫的耳朵竖立着，内部呈粉红色，边缘有浅色的阴影，显示出柔软的毛发。胡须又长又明显，突显了小猫的脸部形状。这个品种的猫看起来是一个常见品种，毛皮图案和眼睛颜色表明它可能是一只虎斑猫。光线柔和，产生一种天鹅绒般的效果，突出了猫绒毛的质感。
{'num_prompt_tokens': 2700, 'num_generated_tokens': 14986, 'num_samples': 100, 'runtime': 23.375922130944673, 'samples/s': 4.277906105257837, 'tokens/s': 641.0870089339394}
"""

Aug 12 '24 05:08 Jintao-Huang

微调minicpm-v-v2_6-chat出现报错: File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 491, in backward torch.autograd.backward( File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

微调其他模型是可以的，微调命令如下：

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft
--model_type minicpm-v-v2_6-chat
--model_id_or_path OpenBMB/MiniCPM-V-2_6
--sft_type lora
--dataset **.jsonl
--deepspeed default-zero2 @Jintao-Huang

Aug 12 '24 07:08 samaritan1998

请教一下，可以提供一下 Async+VLLM 推理 minicpmv2-6的代码么。

Aug 12 '24 10:08 PancakeAwesome

请教一下，可以提供一下 Async+VLLM 推理 minicpmv2-6的代码么。

swift deploy 走的是 Async+VLLM的

客户端调用方式可以查看这里的文档：

https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/vLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E6%96%87%E6%A1%A3.html#id4

Aug 12 '24 10:08 Jintao-Huang

CUDA_VISIBLE_DEVICES=0 swift deploy \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --infer_backend vllm

Aug 12 '24 10:08 Jintao-Huang

请教一下，可以提供一下 Async+VLLM 推理 minicpmv2-6的代码么。

swift deploy 走的是 Async+VLLM的

客户端调用方式可以查看这里的文档：

https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/vLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E6%96%87%E6%A1%A3.html#id4

这个文档显示的是 openai的客户端调用方法，openai 是同步调用吧？异步调用代码是不是得用 asyncio 包吧？

Aug 12 '24 10:08 PancakeAwesome

服务端:

CUDA_VISIBLE_DEVICES=0 swift deploy --model_type minicpm-v-v2_6-chat --infer_backend vllm --max_model_len 8192

客户端：

import asyncio
from swift.llm import get_model_list_client, XRequestConfig, inference_client_async

model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
request_config = XRequestConfig(seed=42)

query = '<image>Describe this image.'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
tasks = [inference_client_async(model_type, query, request_config=request_config) for _ in range(100)]
async def _batch_run(tasks):
    return await asyncio.gather(*tasks)

resp_list = asyncio.run(_batch_run(tasks))
print(f'query: {query}')
print(f'response0: {resp_list[0].choices[0].message.content}')
print(f'response1: {resp_list[1].choices[0].message.content}')

query = '<image>How many sheep are in the picture?'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']

async def _stream():
    global query
    request_config = XRequestConfig(seed=42, stream=True)
    stream_resp = await inference_client_async(model_type, query, images=images, request_config=request_config)
    print(f'query: {query}')
    print('response: ', end='')
    async for chunk in stream_resp:
        print(chunk.choices[0].delta.content, end='', flush=True)
    print()

asyncio.run(_stream())
"""
query: <image>Describe this image.
response0: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
response1: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
query: <image>How many sheep are in the picture?
response: There are five sheep in the picture.
"""

Aug 12 '24 11:08 Jintao-Huang

服务端:

CUDA_VISIBLE_DEVICES=0 swift deploy --model_type minicpm-v-v2_6-chat --infer_backend vllm --max_model_len 8192

客户端：

import asyncio
from swift.llm import get_model_list_client, XRequestConfig, inference_client_async

model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
request_config = XRequestConfig(seed=42)

query = '<image>Describe this image.'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
tasks = [inference_client_async(model_type, query, request_config=request_config) for _ in range(100)]
async def _batch_run(tasks):
    return await asyncio.gather(*tasks)

resp_list = asyncio.run(_batch_run(tasks))
print(f'query: {query}')
print(f'response0: {resp_list[0].choices[0].message.content}')
print(f'response1: {resp_list[1].choices[0].message.content}')

query = '<image>How many sheep are in the picture?'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']

async def _stream():
    global query
    request_config = XRequestConfig(seed=42, stream=True)
    stream_resp = await inference_client_async(model_type, query, images=images, request_config=request_config)
    print(f'query: {query}')
    print('response: ', end='')
    async for chunk in stream_resp:
        print(chunk.choices[0].delta.content, end='', flush=True)
    print()

asyncio.run(_stream())
"""
query: <image>Describe this image.
response0: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
response1: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
query: <image>How many sheep are in the picture?
response: There are five sheep in the picture.
"""

非常感谢你jintao-huang，

请问如何使用 python sdk启动服务呢
如何保障每次异步请求的每次结果都是不一样的呢，因为我看seed 都是一样的
相关其他多模态模型是否也是通用以上代码呢，比如 internvl2

Looking forward ur reply, Thank u!

Aug 12 '24 12:08 PancakeAwesome

如何使用 python sdk启动服务

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))

保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

Aug 12 '24 12:08 Jintao-Huang

如何使用 python sdk启动服务
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))
保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

我是否可以使用 get_vllm_engine 的接口方式，启动 vllm 服务呢？和 deploy_main 的方式有什么区别呢？

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

Aug 12 '24 12:08 PancakeAwesome

minicpmv2-6 & vllm 开启服务要求安装flash-attn的问题已经修复

Aug 12 '24 13:08 Jintao-Huang

如何使用 python sdk启动服务
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))
保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

我是否可以使用 get_vllm_engine 的接口方式，启动 vllm 服务呢？和 deploy_main 的方式有什么区别呢？

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

用 deploy_main sdk同样的 cli 参数会报错：

INFO: 2024-08-12 23:36:53,874 vllm_utils.py:567] generation_config: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.3, top_p=0.7, top_k=20, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=False, spaces_between_special_tokens=True, truncate_prompt_tokens=None)
INFO: 2024-08-12 23:36:53,876 vllm_utils.py:578] system: You are a helpful assistant.
INFO:     Started server process [298157]
INFO:     Waiting for application startup.
Exception in thread Thread-7:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
INFO:     Application startup complete.
    self.run()
  File "/opt/conda/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/ossfs/workspace/ms-swift-main/swift/llm/deploy.py", line 70, in <lambda>
INFO:     Uvicorn running on http://127.0.0.1:8000/ (Press CTRL+C to quit)
    thread = Thread(target=lambda: asyncio.run(_log_stats_hook(_args.log_interval)))
  File "/opt/conda/lib/python3.8/site-packages/nest_asyncio.py", line 27, in run
    loop = asyncio.get_event_loop()
  File "/opt/conda/lib/python3.8/asyncio/events.py", line 639, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'Thread-7'.
/opt/conda/lib/python3.8/threading.py:934: RuntimeWarning: coroutine '_log_stats_hook' was never awaited
  self._invoke_excepthook(self)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Aug 12 '24 15:08 PancakeAwesome

如何使用 python sdk启动服务
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))
保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

我是否可以使用 get_vllm_engine 的接口方式，启动 vllm 服务呢？和 deploy_main 的方式有什么区别呢？

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

楼主已经修复值最新 main 分支，pip install -e '.[all]'
区别在于，python sdk get_vllm_engine 开启的服务，不能用异步调用；而 CLI 开启的 vllm 服务默认是 Async 服务，可以异步调用

Aug 12 '24 15:08 PancakeAwesome

请教一下 VLLM+异步客户端调用支持官方的Fewshot 功能么？fewshot 功能如下：来自：https://huggingface.co/openbmb/MiniCPM-V-2_6#in-context-few-shot-learning

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)

question = "production date" 
image1 = Image.open('example1.jpg').convert('RGB')
answer1 = "2023.08.04"
image2 = Image.open('example2.jpg').convert('RGB')
answer2 = "2007.04.24"
image_test = Image.open('test.jpg').convert('RGB')

msgs = [
    {'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
    {'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
    {'role': 'user', 'content': [image_test, question]}
]

answer = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(answer)

Aug 12 '24 16:08 PancakeAwesome

支持的, 这个就是多轮对话

Aug 12 '24 16:08 Jintao-Huang

支持的, 这个就是多轮对话

请问如果是异步调用 vllm，怎么写 fewshot 呢，参数该怎么传呀

Aug 13 '24 03:08 PancakeAwesome

V100多卡sft，设置 use_flash_attn false，已经报错需要安装flash-attn, 报错位置： get_class_from_dynamic_module('modeling_navit_siglip.SiglipVisionTransformer', model_dir)

Aug 13 '24 06:08 moyans

V100多卡sft，设置 use_flash_attn false，已经报错需要安装flash-attn, 报错位置： get_class_from_dynamic_module('modeling_navit_siglip.SiglipVisionTransformer', model_dir)

我加个except ImportError好了.

Aug 13 '24 06:08 Jintao-Huang

V100多卡sft，设置 use_flash_attn false，已经报错需要安装flash-attn, 报错位置： get_class_from_dynamic_module('modeling_navit_siglip.SiglipVisionTransformer', model_dir)

我加个except ImportError好了.

参照 https://github.com/OpenBMB/MiniCPM-V/pull/461/ 解决了哈，感谢

Aug 13 '24 07:08 moyans

您好，我们测试了您提供的 CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2_6-chat --model_id_or_path openbmb/MiniCPM-V-2_6 以及 video测试代码。发现对视频的测试结果，似乎只依赖于视频第一帧。我们尝试了多次对视频OCR的提取，结果显示都只会输出第一帧的OCR结果。请问能提供具体的测试代码(.py文件)地址么？我们想check一下数据处理的部分，是否只读取了视频第一帧的信息。

Aug 14 '24 12:08 Wuyingwen

您好，我们测试了您提供的 CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2_6-chat --model_id_or_path openbmb/MiniCPM-V-2_6 以及 video测试代码。发现对视频的测试结果，似乎只依赖于视频第一帧。我们尝试了多次对视频OCR的提取，结果显示都只会输出第一帧的OCR结果。请问能提供具体的测试代码(.py文件)地址么？我们想check一下数据处理的部分，是否只读取了视频第一帧的信息。

https://github.com/modelscope/ms-swift/blob/main/swift/llm/utils/template.py#L2594

拉取一下main分支再试试呢，明天应该会发版本

Aug 14 '24 13:08 Jintao-Huang