[ROCm] Add Torch SDPA and xFormers optimization for FireRedASR

Open sammysun0711 opened this issue 1 month ago • 1 comments

Hi FireRedTeam, thanks for your great work!

This PR aims to add FireRedASR optimization on ROCm on target platform AMD Instinct MI300+ GPU.

Add docker/Dockerfile.rocm to quickly setup ROCm7 environment for deployment
Add Pytorch SDPA and xFormer Attention support for performance optimization, can be controlled by environment variable: ATTENTION_BACKEND="SDPA" and ATTENTION_BACKEND="XFORMERS"
Fix torch.load issue with weight_only=False for torch >= 2.6
Add benchmark scripts and torch profiling support for performance analysis of different attention backend.
Add FireRedASR optimization on ROCm guide in README.md.

Here are performance results with example audio (batch size=1) on single MI308X for your reference:

Oct 29 '25 08:10 sammysun0711

Thanks for your PR, we will review.

Nov 24 '25 05:11 kaituoxu