FireRedASR icon indicating copy to clipboard operation
FireRedASR copied to clipboard

What is the best practice for extra long audio?

Open dingkwang opened this issue 8 months ago • 3 comments

For long audio (e.g.,> 20 min), what is the best practice? What do you think of the following method?

  • Split into 30s chunks, with 10s overlap.
  • Get a transcript for each chunk.
  • Manually or use LLM to connect to a full transcript.

dingkwang avatar Mar 30 '25 02:03 dingkwang

  1. use VAD to split long wav, each part shorter than 60s (FireRedASR-AED) or 30s (FireRedASR-LLM)
  2. use FireRedASR to do ASR

FireRedTeam avatar Mar 31 '25 03:03 FireRedTeam

But splitting wav may cause incomplete sentences or lost context.

dingkwang avatar Mar 31 '25 03:03 dingkwang

这些所有问题最终都会归结到 LLM的记忆和上下文长度上,算是一种tradeoff吧

Albertchamberlain avatar May 20 '25 09:05 Albertchamberlain