FireRedASR
FireRedASR copied to clipboard
What is the best practice for extra long audio?
For long audio (e.g.,> 20 min), what is the best practice? What do you think of the following method?
- Split into 30s chunks, with 10s overlap.
- Get a transcript for each chunk.
- Manually or use LLM to connect to a full transcript.
- use VAD to split long wav, each part shorter than 60s (FireRedASR-AED) or 30s (FireRedASR-LLM)
- use FireRedASR to do ASR
But splitting wav may cause incomplete sentences or lost context.
这些所有问题最终都会归结到 LLM的记忆和上下文长度上,算是一种tradeoff吧