MiniCPM-V 记录一下解决minicpm-o-2.6的running的bug

例如，以为了跑通以下readme里面的 Multimodal Live Streaming 章节给出的code，如果我们直接使用如下这段code肯定是跑不通的：

`import math import numpy as np from PIL import Image from moviepy.editor import VideoFileClip import tempfile import librosa import soundfile as sf

def get_video_chunk_content(video_path, flatten=True): video = VideoFileClip(video_path) print('video_duration:', video.duration)

with tempfile.NamedTemporaryFile(suffix=".wav", delete=True) as temp_audio_file:
    temp_audio_file_path = temp_audio_file.name
    video.audio.write_audiofile(temp_audio_file_path, codec="pcm_s16le", fps=16000)
    audio_np, sr = librosa.load(temp_audio_file_path, sr=16000, mono=True)
num_units = math.ceil(video.duration)

contents= []
for i in range(num_units):
    frame = video.get_frame(i+1)
    image = Image.fromarray((frame).astype(np.uint8))
    audio = audio_np[sr*i:sr*(i+1)]
    if flatten:
        contents.extend(["<unit>", image, audio])
    else:
        contents.append(["<unit>", image, audio])

return contents

video_path="/path/to/video" sys_msg = model.get_sys_prompt(mode='omni', language='en')

contents = get_video_chunk_content(video_path) msg = {"role":"user", "content": contents} msgs = [sys_msg, msg]

generate_audio = True output_audio_path = 'output.wav'

res = model.chat( msgs=msgs, tokenizer=tokenizer, sampling=True, temperature=0.5, max_new_tokens=4096, omni_input=True, # please set omni_input=True when omni inference use_tts_template=True, generate_audio=generate_audio, output_audio_path=output_audio_path, max_slice_nums=1, use_image_id=False, return_dict=True ) print(res)`

Jan 16 '25 11:01 thughy

从实践中，需要添加如下 code 才能跑通：

torch.manual_seed(100)

model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True, attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager model = model.eval().cuda() model.init_tts() tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True) print('finish loading model')

Jan 16 '25 11:01 thughy

感谢分享，我们更新一下 github 的示例代码，huggingface 的代码会完整一些

Jan 16 '25 12:01 YuzaChongyi

感谢分享，我们更新一下 github 的示例代码，huggingface 的代码会完整一些

vllm运行minicpm-o-2.6的demo有吗，我按照readme的方式部署了下，调用的时候各种报错，按照文档的格式调不通

Jan 17 '25 07:01 cheng358

感谢分享，我们更新一下 github 的示例代码，huggingface 的代码会完整一些比如如下请求 curl --location --request POST 'http://101.230.144.224:12341/v1/completions'
--header 'Content-Type: application/json'
--data-raw '{ "model": "MiniCPM", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "图中人物是什么性别" }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgA" } } ] } ], "stream": false }'

在2.6上是正常的，但是在哦-.6提示"object": "error", "message": "[{'type': 'missing', 'loc': ('body', 'prompt'), 'msg': 'Field required', 'input': {'model': 'MiniCPM', 'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': '图中人物是什么性别'}, {'type': 'image_url', 'image_url': {'url': 'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/wAARCAHmAwADAREAAhEBAxEB/

限于篇幅图像的base64部分我只贴了一部分

Jan 17 '25 07:01 cheng358

我们更新了 vllm 的 minicpmo 分支，请拉取最新代码再次尝试，以及可以参考 https://github.com/OpenBMB/MiniCPM-o/issues/742

Jan 18 '25 08:01 YuzaChongyi

感谢分享，我们更新一下 github 的示例代码，huggingface 的代码会完整一些

vllm运行minicpm-o-2.6的demo有吗，我按照readme的方式部署了下，调用的时候各种报错，按照文档的格式调不通

现在 o2.6 已经可以通过vllm官方wheel使用(>=0.7.1)，可以直接跑官方的示例 https://docs.vllm.ai/en/latest/getting_started/examples/examples_index.html

Feb 17 '25 08:02 HwwwwwwwH

视频类的通过vllm还不行 vllm 0.9.3, 表现是：等了30分钟没有任何的反馈，可能卡死了 data = { "model": "model", "messages": [ { "role": "system", "content": "You are a helpful assistant.", }, { "role": "user", "content": [ {"type": "text", "text": "请描述这个视频"}, { "type": "video_url", "video_url": { "url": f"data:video/mp4;base64,{video_base64}", }, }, ], }, ], "max_tokens": 200, "temperature": 0, "stop_token_ids": [151645, 151643] }

Aug 20 '25 09:08 whk6688