CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

token2wav开销很大,有什么方向可以优化该处性能

Open wenyangchou opened this issue 8 months ago • 2 comments

在音频解码时,flow中的

feat, _ = self.decoder(
            mu=h.transpose(1, 2).contiguous(),
            mask=mask.unsqueeze(1),
            spks=embedding,
            cond=conds,
            n_timesteps=10
        )

在A800-80G显卡,开启trt情况下,稳定时延在200ms左右,再加上hift有50ms左右的开销。导致在流式推理下,首包始终无法低于250ms。

解码这块还有其他优化方案或者方向思路?

wenyangchou avatar Apr 19 '25 01:04 wenyangchou

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar May 19 '25 02:05 github-actions[bot]

改小n_timesteps,就是推理时flow的采样点,效果影响感觉还好

FlynnFlag avatar Sep 01 '25 07:09 FlynnFlag