token2wav开销很大，有什么方向可以优化该处性能

Open wenyangchou opened this issue 8 months ago • 2 comments

在音频解码时，flow中的

feat, _ = self.decoder(
            mu=h.transpose(1, 2).contiguous(),
            mask=mask.unsqueeze(1),
            spks=embedding,
            cond=conds,
            n_timesteps=10
        )

在A800-80G显卡，开启trt情况下，稳定时延在200ms左右，再加上hift有50ms左右的开销。导致在流式推理下，首包始终无法低于250ms。

解码这块还有其他优化方案或者方向思路？

Apr 19 '25 01:04 wenyangchou

This issue is stale because it has been open for 30 days with no activity.

May 19 '25 02:05 github-actions[bot]

改小n_timesteps，就是推理时flow的采样点，效果影响感觉还好

Sep 01 '25 07:09 FlynnFlag