What is the best practice for extra long audio?

Open dingkwang opened this issue 8 months ago • 3 comments

For long audio (e.g.,> 20 min), what is the best practice? What do you think of the following method?

Mar 30 '25 02:03 dingkwang

use VAD to split long wav, each part shorter than 60s (FireRedASR-AED) or 30s (FireRedASR-LLM)
use FireRedASR to do ASR

Mar 31 '25 03:03 FireRedTeam

But splitting wav may cause incomplete sentences or lost context.

Mar 31 '25 03:03 dingkwang

这些所有问题最终都会归结到 LLM的记忆和上下文长度上，算是一种tradeoff吧

May 20 '25 09:05 Albertchamberlain