CosyVoice cosyvoice2 cosyvoice.llm.llm 导出 onnx 后，再转换为 tensorRT 时出问题

1. onnx 模型导出正常

The model is well-formed and valid!
=======================Model1 inputs:
x_s [1, 'seq_len', 1024]
attn_mask [1, 'seq_len', 'seq_len']
key_cache.1 [7, 8, 'seq_len', 128]
value_cache.1 [7, 8, 'seq_len', 128]
=======================Model1 outputs:
y_pred [1, 'seq_len', 1024]
key_cache [7, 8, 'seq_len', 128]
value_cache [7, 8, 'seq_len', 128]

2. onnx 转 tensorRT 模型时，报错

使用 trtexec 转换

错误信息如下

[04/10/2025-11:04:52] [E] Error[3]: IExecutionContext::setInputShape: Error Code 3: API Usage Error (Parameter check failed, condition: engineDims.d[i] == dims.d[i]. Static dimension mismatch while setting input shape for key_cache.1. Set dimensions are [7,8,32,128]. Expected dimensions are [7,8,1,128].)
[04/10/2025-11:04:52] [E] The engine was built with static shapes for input tensor key_cache.1 but the provided shapes do not match the static shapes!
[04/10/2025-11:04:52] [E] Inference set up failed

查看调用关系，发现是调用了

     transformer.qwen2.Qwen2Encoder
         ->  site-packages/transformers/models/qwen2/modeling_qwen2.py

     其中使用了 DynamicCache

现在导出的 forward 函数定义如下：

def infer_forward(
    self, 
    xs: torch.Tensor,
    att_mask: torch.Tensor = torch.ones((0, 0, 0), dtype=torch.bool),
    key_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32),
    value_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32)
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:

函数实现中，是先申请好 key_cache 和 value_cache shape均为 （7，8，T,  128）

然后再按 第一维分解，再赋值给 申请的 DynamicCache 的 self.key_cache 和 self.value_cache

是不是这种 DynamicCahce 内部， onnx 转 tensorRT 时会存在问题？如何解决？

Apr 10 '25 07:04 dearwind153

官方没有研究过llm转trt

Apr 15 '25 02:04 aluminumbox

This issue is stale because it has been open for 30 days with no activity.

May 16 '25 02:05 github-actions[bot]

1.onnx 模型导出正常

The model is well-formed and valid!
=======================Model1 inputs:
x_s [1, 'seq_len', 1024]
attn_mask [1, 'seq_len', 'seq_len']
key_cache.1 [7, 8, 'seq_len', 128]
value_cache.1 [7, 8, 'seq_len', 128]
=======================Model1 outputs:
y_pred [1, 'seq_len', 1024]
key_cache [7, 8, 'seq_len', 128]
value_cache [7, 8, 'seq_len', 128]

2. onnx 转 tensorRT 模型时，报错

使用 trtexec 转换

错误信息如下

[04/10/2025-11:04:52] [E] Error[3]: IExecutionContext::setInputShape: Error Code 3: API Usage Error (Parameter check failed, condition: engineDims.d[i] == dims.d[i]. Static dimension mismatch while setting input shape for key_cache.1. Set dimensions are [7,8,32,128]. Expected dimensions are [7,8,1,128].)
[04/10/2025-11:04:52] [E] The engine was built with static shapes for input tensor key_cache.1 but the provided shapes do not match the static shapes!
[04/10/2025-11:04:52] [E] Inference set up failed

查看调用关系，发现是调用了

     transformer.qwen2.Qwen2Encoder
         ->  site-packages/transformers/models/qwen2/modeling_qwen2.py

     其中使用了 DynamicCache

现在导出的 forward 函数定义如下：

def infer_forward(
    self, 
    xs: torch.Tensor,
    att_mask: torch.Tensor = torch.ones((0, 0, 0), dtype=torch.bool),
    key_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32),
    value_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32)
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:

函数实现中，是先申请好 key_cache 和 value_cache shape均为 （7，8，T,  128）

然后再按 第一维分解，再赋值给 申请的 DynamicCache 的 self.key_cache 和 self.value_cache

是不是这种 DynamicCahce 内部， onnx 转 tensorRT 时会存在问题？如何解决？

Aug 05 '25 06:08 llf30

你好，请问导出onnx的LLM文件的代码在哪，能分享下吗，感谢感谢

Aug 05 '25 06:08 llf30

cosyvoice2 cosyvoice.llm.llm 导出 onnx 后，再转换为 tensorRT 时出问题

** 1. onnx 模型导出正常**

2. onnx 转 tensorRT 模型时，报错

** 1.onnx 模型导出正常**

2. onnx 转 tensorRT 模型时，报错

1. onnx 模型导出正常

1.onnx 模型导出正常