echogarden icon indicating copy to clipboard operation
echogarden copied to clipboard

Recognition: implement beam search for Whisper decoder

Open rotemdan opened this issue 2 years ago • 0 comments

Beam search would enable the decoder to consider multiple recognitions simultaneously.

Currently not a high priority, because of several reasons:

  • The goal of the Whisper implementation is a good speed / quality tradeoff. Not sure having more than one decoding path would be a good tradeoff in all cases.
  • Whisper inference is currently only supported on CPU, meaning even a beam width of 2 would significantly reduce speed.
  • It is more important, at this moment, to get real-time and streaming recognition running. Due to the extra cost of beam search, it's unlikely it would be used in real-time situations (at least over CPU).
  • There are alternative approaches to get better quality, like using a larger model, or various guided decoding strategies.

rotemdan avatar Jul 27 '23 15:07 rotemdan