FireRedASR icon indicating copy to clipboard operation
FireRedASR copied to clipboard

[ROCm] Add Torch SDPA and xFormers optimization for FireRedASR

Open sammysun0711 opened this issue 1 month ago • 1 comments

Hi FireRedTeam, thanks for your great work!

This PR aims to add FireRedASR optimization on ROCm on target platform AMD Instinct MI300+ GPU.

  • Add docker/Dockerfile.rocm to quickly setup ROCm7 environment for deployment
  • Add Pytorch SDPA and xFormer Attention support for performance optimization, can be controlled by environment variable: ATTENTION_BACKEND="SDPA" and ATTENTION_BACKEND="XFORMERS"
  • Fix torch.load issue with weight_only=False for torch >= 2.6
  • Add benchmark scripts and torch profiling support for performance analysis of different attention backend.
  • Add FireRedASR optimization on ROCm guide in README.md.

Here are performance results with example audio (batch size=1) on single MI308X for your reference:

ATTENTION_BACKEND RTF Performance gain vs Native
Native 0.063 /
Torch SDPA 0.048 23.81%
xFormers Attention 0.056 11.11%

sammysun0711 avatar Oct 29 '25 08:10 sammysun0711

Thanks for your PR, we will review.

kaituoxu avatar Nov 24 '25 05:11 kaituoxu