CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

cosyvoice2 cosyvoice.llm.llm 导出 onnx 后,再转换为 tensorRT 时出问题

Open dearwind153 opened this issue 8 months ago • 2 comments

** 1. onnx 模型导出正常**

The model is well-formed and valid!
=======================Model1 inputs:
x_s [1, 'seq_len', 1024]
attn_mask [1, 'seq_len', 'seq_len']
key_cache.1 [7, 8, 'seq_len', 128]
value_cache.1 [7, 8, 'seq_len', 128]
=======================Model1 outputs:
y_pred [1, 'seq_len', 1024]
key_cache [7, 8, 'seq_len', 128]
value_cache [7, 8, 'seq_len', 128]

2. onnx 转 tensorRT 模型时,报错

使用 trtexec 转换

错误信息如下

[04/10/2025-11:04:52] [E] Error[3]: IExecutionContext::setInputShape: Error Code 3: API Usage Error (Parameter check failed, condition: engineDims.d[i] == dims.d[i]. Static dimension mismatch while setting input shape for key_cache.1. Set dimensions are [7,8,32,128]. Expected dimensions are [7,8,1,128].)
[04/10/2025-11:04:52] [E] The engine was built with static shapes for input tensor key_cache.1 but the provided shapes do not match the static shapes!
[04/10/2025-11:04:52] [E] Inference set up failed
  1. 查看调用关系,发现是调用了
     transformer.qwen2.Qwen2Encoder
         ->  site-packages/transformers/models/qwen2/modeling_qwen2.py

     其中使用了 DynamicCache

现在导出的 forward 函数定义如下:

def infer_forward(
    self, 
    xs: torch.Tensor,
    att_mask: torch.Tensor = torch.ones((0, 0, 0), dtype=torch.bool),
    key_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32),
    value_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32)
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:

函数实现中,是先申请好 key_cache 和 value_cache shape均为 (7,8,T,  128)

然后再按 第一维分解,再赋值给 申请的 DynamicCache 的 self.key_cache 和 self.value_cache

是不是这种 DynamicCahce 内部, onnx 转 tensorRT 时会存在问题?如何解决?

dearwind153 avatar Apr 10 '25 07:04 dearwind153

官方没有研究过llm转trt

aluminumbox avatar Apr 15 '25 02:04 aluminumbox

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar May 16 '25 02:05 github-actions[bot]

** 1.onnx 模型导出正常**

The model is well-formed and valid!
=======================Model1 inputs:
x_s [1, 'seq_len', 1024]
attn_mask [1, 'seq_len', 'seq_len']
key_cache.1 [7, 8, 'seq_len', 128]
value_cache.1 [7, 8, 'seq_len', 128]
=======================Model1 outputs:
y_pred [1, 'seq_len', 1024]
key_cache [7, 8, 'seq_len', 128]
value_cache [7, 8, 'seq_len', 128]

2. onnx 转 tensorRT 模型时,报错

使用 trtexec 转换

错误信息如下

[04/10/2025-11:04:52] [E] Error[3]: IExecutionContext::setInputShape: Error Code 3: API Usage Error (Parameter check failed, condition: engineDims.d[i] == dims.d[i]. Static dimension mismatch while setting input shape for key_cache.1. Set dimensions are [7,8,32,128]. Expected dimensions are [7,8,1,128].)
[04/10/2025-11:04:52] [E] The engine was built with static shapes for input tensor key_cache.1 but the provided shapes do not match the static shapes!
[04/10/2025-11:04:52] [E] Inference set up failed
  1. 查看调用关系,发现是调用了
     transformer.qwen2.Qwen2Encoder
         ->  site-packages/transformers/models/qwen2/modeling_qwen2.py

     其中使用了 DynamicCache

现在导出的 forward 函数定义如下:

def infer_forward(
    self, 
    xs: torch.Tensor,
    att_mask: torch.Tensor = torch.ones((0, 0, 0), dtype=torch.bool),
    key_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32),
    value_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32)
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:

函数实现中,是先申请好 key_cache 和 value_cache shape均为 (7,8,T,  128)

然后再按 第一维分解,再赋值给 申请的 DynamicCache 的 self.key_cache 和 self.value_cache

是不是这种 DynamicCahce 内部, onnx 转 tensorRT 时会存在问题?如何解决?

llf30 avatar Aug 05 '25 06:08 llf30

你好,请问导出onnx的LLM文件的代码在哪,能分享下吗,感谢感谢

llf30 avatar Aug 05 '25 06:08 llf30