CosyVoice
CosyVoice copied to clipboard
cosyvoice2 cosyvoice.llm.llm 导出 onnx 后,再转换为 tensorRT 时出问题
** 1. onnx 模型导出正常**
The model is well-formed and valid!
=======================Model1 inputs:
x_s [1, 'seq_len', 1024]
attn_mask [1, 'seq_len', 'seq_len']
key_cache.1 [7, 8, 'seq_len', 128]
value_cache.1 [7, 8, 'seq_len', 128]
=======================Model1 outputs:
y_pred [1, 'seq_len', 1024]
key_cache [7, 8, 'seq_len', 128]
value_cache [7, 8, 'seq_len', 128]
2. onnx 转 tensorRT 模型时,报错
使用 trtexec 转换
错误信息如下
[04/10/2025-11:04:52] [E] Error[3]: IExecutionContext::setInputShape: Error Code 3: API Usage Error (Parameter check failed, condition: engineDims.d[i] == dims.d[i]. Static dimension mismatch while setting input shape for key_cache.1. Set dimensions are [7,8,32,128]. Expected dimensions are [7,8,1,128].)
[04/10/2025-11:04:52] [E] The engine was built with static shapes for input tensor key_cache.1 but the provided shapes do not match the static shapes!
[04/10/2025-11:04:52] [E] Inference set up failed
- 查看调用关系,发现是调用了
transformer.qwen2.Qwen2Encoder
-> site-packages/transformers/models/qwen2/modeling_qwen2.py
其中使用了 DynamicCache
现在导出的 forward 函数定义如下:
def infer_forward(
self,
xs: torch.Tensor,
att_mask: torch.Tensor = torch.ones((0, 0, 0), dtype=torch.bool),
key_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32),
value_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32)
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
函数实现中,是先申请好 key_cache 和 value_cache shape均为 (7,8,T, 128)
然后再按 第一维分解,再赋值给 申请的 DynamicCache 的 self.key_cache 和 self.value_cache
是不是这种 DynamicCahce 内部, onnx 转 tensorRT 时会存在问题?如何解决?
官方没有研究过llm转trt
This issue is stale because it has been open for 30 days with no activity.
** 1.onnx 模型导出正常**
The model is well-formed and valid! =======================Model1 inputs: x_s [1, 'seq_len', 1024] attn_mask [1, 'seq_len', 'seq_len'] key_cache.1 [7, 8, 'seq_len', 128] value_cache.1 [7, 8, 'seq_len', 128] =======================Model1 outputs: y_pred [1, 'seq_len', 1024] key_cache [7, 8, 'seq_len', 128] value_cache [7, 8, 'seq_len', 128]2. onnx 转 tensorRT 模型时,报错
使用 trtexec 转换
错误信息如下
[04/10/2025-11:04:52] [E] Error[3]: IExecutionContext::setInputShape: Error Code 3: API Usage Error (Parameter check failed, condition: engineDims.d[i] == dims.d[i]. Static dimension mismatch while setting input shape for key_cache.1. Set dimensions are [7,8,32,128]. Expected dimensions are [7,8,1,128].) [04/10/2025-11:04:52] [E] The engine was built with static shapes for input tensor key_cache.1 but the provided shapes do not match the static shapes! [04/10/2025-11:04:52] [E] Inference set up failed
- 查看调用关系,发现是调用了
transformer.qwen2.Qwen2Encoder -> site-packages/transformers/models/qwen2/modeling_qwen2.py 其中使用了 DynamicCache 现在导出的 forward 函数定义如下: def infer_forward( self, xs: torch.Tensor, att_mask: torch.Tensor = torch.ones((0, 0, 0), dtype=torch.bool), key_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32), value_cache: torch.Tensor = torch.zeros((0, 0, 0, 0), dtype=torch.float32) ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: 函数实现中,是先申请好 key_cache 和 value_cache shape均为 (7,8,T, 128) 然后再按 第一维分解,再赋值给 申请的 DynamicCache 的 self.key_cache 和 self.value_cache是不是这种 DynamicCahce 内部, onnx 转 tensorRT 时会存在问题?如何解决?
你好,请问导出onnx的LLM文件的代码在哪,能分享下吗,感谢感谢