Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

MiniCPMV-2.6 HF 的推理结果和 lmdeploy 结果不一致，复现的时候使用了 top_k = 1 结果仍然不一致。

复现流程

准备数据

wget "https://support.huaweicloud.com/api-ocr/zh-cn_image_0000001698774808.png" 到当前目录文件 zh-cn_image_0000001698774808.png

LMdeploy 复现方式

使用镜像 openmmlab/lmdeploy:v0.6.0a0-cu12
使用如下代码调用

from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig

from lmdeploy import pipeline
from lmdeploy.vl import load_image

backend_config = TurbomindEngineConfig(max_batch_size=1, cache_max_entry_count=0.4)

pipe = pipeline('/data/models/openbmb/MiniCPM-V-2_6/', log_level='INFO', backend_config=backend_config)

image_path = "zh-cn_image_0000001698774808.png"

prompt = "请详细识别图中的内容并以 markdown 格式返回"
messages = [
    dict(role='user', content=[
        dict(type='text', text=prompt),
        dict(type='image_url', image_url=dict(url=image_path)),
    ])
]

gen_config = GenerationConfig(top_p=1, top_k=1, temperature=0.1, repetition_penalty=1.05, max_new_tokens=4096)
out = pipe(messages, gen_config=gen_config)
print(out.text)

模型输出

这张图片展示了一份门诊检验报告单，具体内容如下：

**标题：**
门诊检验报告单

**副标题：**
血常规（5分类）

**状态说明：**
标本状态：正常

**临床诊断：**
1. 慢性扁桃体炎

**检验项目列表及结果：**
- 中性细胞百分率 (NEL%)：77.1%
  - 参考范围：40-75%
- 淋巴细胞百分率 (LYM%)：8.8%
  - 参考范围：20-50%
- 单核细胞百分率 (MONO%)：7.1%
  - 参考范围：3.0-10.0%
- 红细胞计数 (RBC)：6.66
  - 参考范围：4.3-5.8%

**签名区域：**
送检医生：
检验者：
审核者：

这份报告详细列出了患者的血常规检查结果，并根据参考范围对各项指标进行了评估。

HF 复现方式

代码：

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer
import os
import base64
import httpx


model = AutoModel.from_pretrained('/data/models/openbmb/MiniCPM-V-2_6/', trust_remote_code=True,
    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('/data/models/openbmb/MiniCPM-V-2_6/', trust_remote_code=True)


def chat_llm(image_path, prompt):
    image = Image.open(image_path).convert('RGB')
    message = [{'role': 'user', 'content': [image, prompt]}]
    res = model.chat(
        image=None,
        msgs=message,
        tokenizer=tokenizer,
        temperature=0.1,
        top_p = 1,
        top_k = 1,
        do_sample=True,
        repetition_penalty=1.05,
    )
    print(res)

print("==============")
prompt = "请详细识别图中的内容并以 markdown 格式返回"
chat_llm("zh-cn_image_0000001698774808.png", prompt)

模型输出：

这张图片展示了一份门诊检验报告单，具体内容如下：

**标题：门诊检验报告单**

**副标题：血常规（5分类）**

**标本状态：正常**

**临床诊断：1.慢性扁桃体炎**

| 检验项目 | 结果 | 参考范围 | 单位 |
|---------|------|----------|------|
| 中性细胞百分率 (NEL%) | 77.1 | 40-75 | % |
| 淋巴细胞百分率 (LYM%) | 8.8 | 20-50 | % |
| 单核细胞百分率 (MONO%) | 7.1 | 3.0-10.0 | % |
| 红细胞计数 (RBC) | 6.66 | 4.3-5.8 | % |

**送检医生：**
[空白]

**检验者：**
[空白]

**审核者：**
[空白]

Reproduction

如上所示

Environment

如上所示

Error traceback

No response

Aug 26 '24 12:08 zhjunqin

对比

LMDeploy 的日志

prompt='<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image_id>0</image_id><image><IMAGE_TOKEN></image><slice><IMAGE_TOKEN></slice><slice><IMAGE_TOKEN></slice>\n<slice><IMAGE_TOKEN></slice><slice><IMAGE_TOKEN></slice>\n请详细识别图中的内容并以 markdown 格式返回<|im_end|>\n<|im_start|>assistant\n', 
gen_config=EngineGenerationConfig(n=1, max_new_tokens=8192, top_p=1.0, top_k=1, temperature=0.1, repetition_penalty=1.05, ignore_eos=False, random_seed=4988746044838101047, stop_words=[151645], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None, logits_processors=None),
prompt_token_id=[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 151658, 15, 151659, 151646, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151647, 151656, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151657, 151656, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151657, 198, 151656, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151657, 151656, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151657, 198, 14880, 100700, 102450, 28029, 101047, 43815, 62926, 23031, 50494, 51461, 120, 28330, 31526, 151645, 198, 151644, 77091, 198]

HF 的 debug 信息

input_ids: tensor([[151644,   8948,    198,   2610,    525,    264,  10950,  17847,     13,
         151645,    198, 151644,    872,    198, 151658,     15, 151659, 151646,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 151647, 151656, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 151657, 151656, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 151657,    198,
         151656, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 151657, 151656, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 151657,    198,  14880, 100700,
         102450,  28029, 101047,  43815,  62926,  23031,  50494,  51461,    120,
          28330,  31526, 151645,    198, 151644,  77091,    198]],
       device='cuda:0', dtype=torch.int32)

对比

除了 128244 这个 ID 外，其他的 ID 都一样

Aug 26 '24 12:08 zhjunqin

This model is supported in lmdeploy v0.6.0a0 May upgrade to the latest version

Aug 26 '24 12:08 lvhan028

This model is supported in lmdeploy v0.6.0a0 May upgrade to the latest version

I just tested on image openmmlab/lmdeploy:v0.6.0a0-cu12, the problem still exists.

Aug 27 '24 01:08 zhjunqin

We use 0 as the placeholder for the image embedding.

Aug 27 '24 02:08 lvhan028

@zhjunqin

是想结果完全一样么？这个感觉不太可能。之前支持mimicpm-v-2.6的时候，我从hf那边拿到输入的embedding让lmdeploy推理，不开sampling的情况下，结果也是有差异的，这里认为是不同的kernel造成的误差。

另外除了kernel的实现有差异，lmdeploy这边vision推理是用的数据类型是float16，和你贴的hf复现方式中的类型bfloat16也会造成一些差别。不过就像上面说的，即使是相同的输入(embedding)，两边的结果也会有一些差异。

你比较了input_ids，0和128244是两边不同的占位符，在embedding这个阶段都会用图像特征来替代的，如果只有这个数字有差异，也说明了input_ids两边是对齐的。

Aug 27 '24 03:08 irexyc

@zhjunqin

是想结果完全一样么？这个感觉不太可能。之前支持mimicpm-v-2.6的时候，我从hf那边拿到输入的embedding让lmdeploy推理，不开sampling的情况下，结果也是有差异的，这里认为是不同的kernel造成的误差。

另外除了kernel的实现有差异，lmdeploy这边vision推理是用的数据类型是float16，和你贴的hf复现方式中的类型bfloat16也会造成一些差别。不过就像上面说的，即使是相同的输入(embedding)，两边的结果也会有一些差异。

你比较了input_ids，0和128244是两边不同的占位符，在embedding这个阶段都会用图像特征来替代的，如果只有这个数字有差异，也说明了input_ids两边是对齐的。

对，期望得到一样的确定性结果。

嗯，我对比 input_ids 想表达的也是，输入的 input_ids 两者是一致的。

根据你的分析来看，两边本身就是无法对其了，是么？而且我发现，特别是当模型输出文本越长，到后面，两者之间的差异越大。

Aug 27 '24 06:08 zhjunqin

@zhjunqin

是这样的，greedy decoding的情况下，一般前面还能保持一致，一旦某个位置出现差异的话，就相当于从这里开始输入就不一致了，后面一样的概率更小了。

Aug 27 '24 06:08 irexyc

lmdeploy这边vision推理是用的数据类型是float16，和你贴的hf复现方式中的类型bfloat16也会造成一些差别。不过就像上面说的，即使是相同的输入(embedding)，两边的结果也会有一些差异 @irexyc 您好，我想问下lmdeploy的LLM部分是用什么数据类型呀

Dec 27 '24 07:12 peace-zy

@peace-zy

根据 config.json中的torch_dtype来的

Dec 27 '24 07:12 irexyc

@peace-zy

根据 config.json中的torch_dtype来的 @irexyc 好的，多谢了，具体代码段是这里吗？ https://github.com/InternLM/lmdeploy/blob/4e5cc16682bf6a413acff493874c90c91255f8bc/lmdeploy/turbomind/deploy/target_model/base.py#L29

Dec 27 '24 07:12 peace-zy

@peace-zy

自动判断类型是这一段，目前只能跑fp16, bf16

https://github.com/InternLM/lmdeploy/blob/4e5cc16682bf6a413acff493874c90c91255f8bc/lmdeploy/turbomind/deploy/converter.py#L101-L148

或者可以手动指定引擎的 dtype https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/messages.py

Dec 27 '24 07:12 irexyc

@peace-zy

自动判断类型是这一段，目前只能跑fp16, bf16

https://github.com/InternLM/lmdeploy/blob/4e5cc16682bf6a413acff493874c90c91255f8bc/lmdeploy/turbomind/deploy/converter.py#L101-L148

或者可以手动指定引擎的 dtype https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/messages.py

@irexyc 好的，多谢了

Dec 27 '24 07:12 peace-zy

@irexyc 为什么vision部分强制是float16呀？bf16的时候会有问题吗？还有我发现一个现象，我用InternVL2 8B的模型，在H100和A100 80g 同样代码、同样镜像，temperature设置为0时，出现了奇怪现象 1.H100不开启flash attention的结果和A100开启flash attention的结果才能一致 2.两个型号gpu同时开启flash attention时结果不一致，同时不开启时也不一致。这是什么原因呢

Dec 27 '24 07:12 peace-zy

因为之前引擎侧接受的输入是numpy的类型，numpy没有bf16数据类型，另外不是所有的gpu都支持bf16，为了简单就强制fp16了。

为了结果一致，需要硬件相同，算子实现一样才能没有完全没有误差。不需要纠结单个样例的结果是不是完全一样，不放心的话，可以用opencompass测下精度，这个对齐就没问题了。

Dec 27 '24 07:12 irexyc

因为之前引擎侧接受的输入是numpy的类型，numpy没有bf16数据类型，另外不是所有的gpu都支持bf16，为了简单就强制fp16了。

为了结果一致，需要硬件相同，算子实现一样才能没有完全没有误差。不需要纠结单个样例的结果是不是完全一样，不放心的话，可以用opencompass测下精度，这个对齐就没问题了。

@irexyc 非常感谢

Dec 27 '24 07:12 peace-zy

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

Jan 04 '25 02:01 github-actions[bot]

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

Jan 09 '25 03:01 github-actions[bot]

lmdeploy
lmdeploy copied to clipboard

[Bug] MiniCPMV-2.6 HF 的推理结果和 lmdeploy 结果不一致

Checklist

Describe the bug

复现流程

准备数据

LMdeploy 复现方式

HF 复现方式

Reproduction

Environment

Error traceback

LMDeploy 的日志

HF 的 debug 信息

对比

lmdeploy lmdeploy copied to clipboard

[Bug] MiniCPMV-2.6 HF 的推理结果和 lmdeploy 结果不一致

Checklist

Describe the bug

复现流程

准备数据

LMdeploy 复现方式

HF 复现方式

Reproduction

Environment

Error traceback

LMDeploy 的日志

HF 的 debug 信息

对比

lmdeploy
lmdeploy copied to clipboard