echogarden
echogarden copied to clipboard
Recognition: implement beam search for Whisper decoder
Beam search would enable the decoder to consider multiple recognitions simultaneously.
Currently not a high priority, because of several reasons:
- The goal of the Whisper implementation is a good speed / quality tradeoff. Not sure having more than one decoding path would be a good tradeoff in all cases.
- Whisper inference is currently only supported on CPU, meaning even a beam width of 2 would significantly reduce speed.
- It is more important, at this moment, to get real-time and streaming recognition running. Due to the extra cost of beam search, it's unlikely it would be used in real-time situations (at least over CPU).
- There are alternative approaches to get better quality, like using a larger model, or various guided decoding strategies.