FireRedASR icon indicating copy to clipboard operation
FireRedASR copied to clipboard

Optimize beam search & add flash attention+xformers support

Open xsank opened this issue 1 month ago • 3 comments

SDPA erformance improvement is approximately 50%, flash attention nearly 100%, depends on the data and the batch size. The greater the difference in audio length, the better the optimization effect. If you use batch size=1, no effect. @kaituoxu

xsank avatar Nov 13 '25 06:11 xsank

@xsank The test did not show any performance improvement.

Xujianzhong avatar Nov 14 '25 02:11 Xujianzhong

@xsank The test did not show any performance improvement.

@Xujianzhong which test? let me see see.

xsank avatar Nov 14 '25 03:11 xsank

Thanks for your PR, we will review.

kaituoxu avatar Nov 24 '25 05:11 kaituoxu