FireRedASR
FireRedASR copied to clipboard
Optimize beam search & add flash attention+xformers support
SDPA erformance improvement is approximately 50%, flash attention nearly 100%, depends on the data and the batch size. The greater the difference in audio length, the better the optimization effect. If you use batch size=1, no effect. @kaituoxu
@xsank The test did not show any performance improvement.
@xsank The test did not show any performance improvement.
@Xujianzhong which test? let me see see.
Thanks for your PR, we will review.