BAAI / bge-reranker-base 模型转为onnx的疑问

如题，当我转换huggingface上提供的模型为onnx时，生成的onnx模型在运行时只输出logits，而不是分类的分数。

转换代码如下

import torch
from transformers import BertForSequenceClassification
import onnx
from transformers import AutoModel
import logging
logging.basicConfig(level=logging.DEBUG)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
# 加载保存的模型
model = AutoModel.from_pretrained('/workspace/01_triton/model_save/model_zoo/bge-reranker-base')

def make_train_dummy_input(seq_len):
    org_input_ids = torch.tensor(
        [[i for i in range(seq_len)]], dtype=torch.int32)
    org_input_mask = torch.tensor([[1 for i in range(int(
        seq_len/2))] + [1 for i in range(seq_len - int(seq_len/2))]], dtype=torch.int32)
    return (org_input_ids.to(device), org_input_mask.to(device))

model.eval()

with torch.no_grad():
    model=model.to(device)
    org_dummy_input = make_train_dummy_input(64)
    # print(org_dummy_input)
    output = torch.onnx.export(model,
                               org_dummy_input,
                               "model.onnx",
                               verbose=True,
                               opset_version=11,
                               # 需要注意顺序！不可随意改变, 否则结果与预期不符
                               input_names=[
                                   'input_ids', 'attention_mask'],
                               # 需要注意顺序, 否则在推理阶段可能用错output_names
                               output_names=['logits'],
                               do_constant_folding=True,
                               dynamic_axes={"input_ids": {0: "batch_size", 1: "sequence_length"},
                                             "attention_mask": {0: "batch_size", 1: "sequence_length"},
                                             "logits": {0: "batch_size"}
                                            }
                               )

转换运行代码：

onnx_model_path = "./model.onnx"
session = onnxruntime.InferenceSession(onnx_model_path)
outputs  = session.run(None, {"input_ids":input_ids,"attention_mask":attention_mask})
print(outputs)

onnx运行输出和transformer输出一致，如下图。问题是给的模型没有分类头。。。请问我该怎么得到完整的onnx模型呢？是我操作失误吗？谢谢解答

Jan 18 '24 10:01 Gcstk

转换的onnx模型结构如下

Jan 18 '24 10:01 Gcstk

可以参考一些开源社区的onnx版本：https://huggingface.co/swulling/bge-reranker-large-onnx-o4

Jan 18 '24 12:01 staoxiao

可以参考一些开源社区的onnx版本：https://huggingface.co/swulling/bge-reranker-large-onnx-o4 非常感谢，这太有用了。不过有个疑问，为啥会只导出部分模型的情况呢？万分感谢解答因为第一次出现这种情况，想了解学习一下。

Jan 18 '24 15:01 Gcstk

可以参考一些开源社区的onnx版本：https://huggingface.co/swulling/bge-reranker-large-onnx-o4 非常感谢，这太有用了。不过有个疑问，为啥会只导出部分模型的情况呢？万分感谢解答因为第一次出现这种情况，想了解学习一下。

应该使用AutoModelForSequenceClassification而不是AutoModel, AutoModel模型不会加载分类头。

Jan 18 '24 17:01 staoxiao

建议切换到trt，对性能提升更好，可以参考相关代码

https://github.com/flyme2023/bge

Jan 23 '24 03:01 flyme2023

建议切换到trt，对性能提升更好，可以参考相关代码

https://github.com/flyme2023/bge

太感谢了，因为刚刚才在排查转为trt推理结果和onnx不一致的问题。我使用的是nvidia的nvcr.io/nvidia/tensorrt:23.06-py3 容器内执行：trtexec --onnx=/workspace/model.onnx
--saveEngine=/workspace/model.plan
--minShapes=input_ids:1x1,attention_mask:1x1
--optShapes=input_ids:6x128,attention_mask:6x128
--maxShapes=input_ids:24x512,attention_mask:24x512
--memPoolSize=workspace:8096
--fp16 上述命令行进行转换的方法，试过好多次。推理结果出错，正在使用polygraphy排查。。谢谢你的参考资源

Jan 23 '24 03:01 Gcstk

建议切换到trt，对性能提升更好，可以参考相关代码 https://github.com/flyme2023/bge

太感谢了，因为刚刚才在排查转为trt推理结果和onnx不一致的问题。我使用的是nvidia的nvcr.io/nvidia/tensorrt:23.06-py3 容器内执行：trtexec --onnx=/workspace/model.onnx --saveEngine=/workspace/model.plan --minShapes=input_ids:1x1,attention_mask:1x1 --optShapes=input_ids:6x128,attention_mask:6x128 --maxShapes=input_ids:24x512,attention_mask:24x512 --memPoolSize=workspace:8096 --fp16 上述命令行进行转换的方法，试过好多次。推理结果出错，正在使用polygraphy排查。。谢谢你的参考资源

我也遇到了 reranker 转为 trt 和 onnx 不一致的问题，请问您排查出原因了吗 😊

Feb 01 '24 00:02 Arrivederci

建议切换到trt，对性能提升更好，可以参考相关代码 https://github.com/flyme2023/bge

太感谢了，因为刚刚才在排查转为trt推理结果和onnx不一致的问题。我使用的是nvidia的nvcr.io/nvidia/tensorrt:23.06-py3 容器内执行：trtexec --onnx=/workspace/model.onnx --saveEngine=/workspace/model.plan --minShapes=input_ids:1x1,attention_mask:1x1 --optShapes=input_ids:6x128,attention_mask:6x128 --maxShapes=input_ids:24x512,attention_mask:24x512 --memPoolSize=workspace:8096 --fp16 上述命令行进行转换的方法，试过好多次。推理结果出错，正在使用polygraphy排查。。谢谢你的参考资源

我也遇到了 reranker 转为 trt 和 onnx 不一致的问题，请问您排查出原因了吗 😊

还没，最近时间紧后续会进行排查，有情况可以交流一下

Feb 01 '24 02:02 Gcstk

建议切换到trt，对性能提升更好，可以参考相关代码 https://github.com/flyme2023/bge

太感谢了，因为刚刚才在排查转为trt推理结果和onnx不一致的问题。我使用的是nvidia的nvcr.io/nvidia/tensorrt:23.06-py3 容器内执行：trtexec --onnx=/workspace/model.onnx --saveEngine=/workspace/model.plan --minShapes=input_ids:1x1,attention_mask:1x1 --optShapes=input_ids:6x128,attention_mask:6x128 --maxShapes=input_ids:24x512,attention_mask:24x512 --memPoolSize=workspace:8096 --fp16 上述命令行进行转换的方法，试过好多次。推理结果出错，正在使用polygraphy排查。。谢谢你的参考资源

您好，请问下您这个配置是什么，最近我也在尝试转onnx，但是我这个电脑没显卡，请问您试用的配置是多少，推理速度大概能达到多少？

Feb 01 '24 05:02 xhs111

你好，请问这个问题是咋解决的。使用AutoModelForSequenceClassification嘛

Apr 17 '24 10:04 hjunjie0324

你好，请问这个问题是咋解决的。使用AutoModelForSequenceClassification嘛

是的，但转trt暂时没成功。应该是版本问题，你可以试试使用上面的github连接转换一下

Apr 17 '24 11:04 Gcstk

@Gcstk 请问你对比测试了原来的torch版本和 onnx版本两者直接的推理效率吗？GPU的推理效率。我测试的onnx模型耗时比原来的torch耗时高出很多很多。（token长度2048，batch=4，torch模型推理耗时4秒，onnx模型推理耗时26秒）

Jul 23 '24 07:07 Tian14267

FlagEmbedding
FlagEmbedding copied to clipboard

请问BAAI / bge-reranker-base 模型转为onnx怎么只有分类前的部分？

BAAI / bge-reranker-base 模型转为onnx的疑问

转换代码如下

FlagEmbedding FlagEmbedding copied to clipboard

请问BAAI / bge-reranker-base 模型转为onnx怎么只有分类前的部分？

BAAI / bge-reranker-base 模型转为onnx的疑问

转换代码如下

FlagEmbedding
FlagEmbedding copied to clipboard